Cloud Native GitLab Runners on Kubernetes: Autoscaling and Observability
Autoscaling GitLab Runners
Autoscaling in Kubernetes can be done using different mechanisms. One of the most common ways is to use the Horizontal Pod Autoscaler (HPA). The HPA automatically scales the number of pods in a deployment based on different types of metrics like:
- CPU utilization and memory usage which are resource metrics that are collected by the Kubernetes metrics server by default.
- Custom metrics which are metrics other than the common CPU and memory metrics. These metrics can be typical application metrics like the number of requests per second, the number of jobs run by a runner, etc. These metrics are collected by Prometheus and exposed to the Kubernetes API server using the Prometheus Adapter.
ℹ️ The Prometheus Adapter is a component that allows Kubernetes to use Prometheus as a source for custom metrics. It translates custom metrics from Prometheus into a format that can be consumed by the Kubernetes API server. This allows your HPA to consume more meaningful metrics than just CPU and memory usage.
So, to autoscale the GitLab Runner, we need to create a custom metric then configure the HPA to use this metric. If you just want to use CPU or memory usage (of the runner pods) as a metric, you don't need to go through this process because these metrics are collected by default by the Kubernetes metrics server. However, both metrics are not really useful for autoscaling GitLab Runner. There are indeed better metrics to use like the number of jobs run by the runner. You can view this metric by running the following command:
# export the pod name of the runner
pod=$( \
kubectl get pods \
-l app=gitlab-runner \
-o jsonpath='{.items[0].metadata.name}'\
)
# list the metrics and filter the gitlab_runner_jobs metric
kubectl exec -i "$pod" \
-- curl -s localhost:9252/metrics | \
grep gitlab_runner_jobs
This metric is a gauge that represents the number of jobs run by the runner at a given time. We can use this metric to identify the number of jobs run by the runner during a specific time window and if the number exceeds a certain threshold, we can scale the runner horizontally by adding more pods.
First, we need to install the Prometheus Adapter:
# Add the Prometheus community Helm repository
helm repo add \
prometheus-community \
https://prometheus-community.github.io/helm-charts
# Update the Helm repositories
helm repo update
# Install the Prometheus Adapter
helm install \
prometheus-adapter \
prometheus-community/prometheus-adapter \
--namespace monitoring \
--create-namespace \
--version 5.2.0
Now, we need to create a custom metric for the number of jobs run by the runner during a specific time window. We can do this by updating the Prometheus Adapter values file. Create the file and add the necessary configurations by executing the following command:
mkdir -p $HOME/todo/prometheus-adapter && \
cat <<'EOT' > $HOME/todo/prometheus-adapter/values.yaml
# Prometheus Adapter must be configured to query the correct
# Prometheus instance. The adapter does not scrape targets itself.
# Instead, it runs PromQL queries against Prometheus and exposes the
# results via the Kubernetes Custom Metrics / External Metrics APIs.
prometheus:
# URL of the Prometheus server that the adapter will query.
# This points to the Prometheus Service inside the 'monitoring'
# namespace.
url: http://prometheus-operator-kube-p-prometheus.monitoring.svc
# Prometheus HTTP port. 9090 is the default.
port: 9090
# Base path for the Prometheus API. Empty means "/".
path: ""
# Custom rules that define which Prometheus series are selected,
# how they map to Kubernetes resources, and which PromQL query is
# executed to produce the exposed external metric.
rules:
external:
# `seriesQuery` selects which metric series the adapter should
# consider. This matches series whose metric name is exactly
# `gitlab_runner_jobs`.
- seriesQuery: '{__name__="gitlab_runner_jobs"}'
# `resources` defines how Prometheus labels are mapped to
# Kubernetes resources (namespace/pod/etc.). These mappings only
# work if the Prometheus series actually contains the labels you
# reference (for example: namespace, pod).
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
# `name` controls how the resulting external metric is named when
# exposed to Kubernetes. `matches` matches the incoming series
# name, and `as` is the name you will query via the external
# metrics API.
name:
matches: "gitlab_runner_jobs"
as: "gitlab_runner_jobs_sum_total"
# `metricsQuery` is the PromQL query executed by the adapter.
# IMPORTANT:
# - The `by (...)` labels must exist in the result.
# - If you want the metric to be attributable to pod/namespace,
# you must group by those labels (and they must exist on the series).
# This query sums the samples over the last 5 minutes, then sums
# across series, grouped by the `runner` label (if present).
metricsQuery: 'sum(sum_over_time(gitlab_runner_jobs[5m])) by (runner)'
EOT
Note that the metricsQuery field is a PromQL query that computes, per runner, the total sum of all gitlab_runner_jobs samples observed over the last 5 minutes (accumulated value).
Here is an example of how an output of the query could look like at a given time:
{runner="xK5NKzMpb"} 6
6 is the total number of jobs run by the runner with the id xK5NKzMpb in the last 5 minutes. If you have multiple runners, you will see multiple lines in the output. The Prometheus Adapter will expose this metric to the Kubernetes API server as gitlab_runner_jobs_sum_total.
Let's upgrade the Prometheus Adapter to apply the changes:
helm upgrade \
--namespace monitoring \
prometheus-adapter \
-f $HOME/todo/prometheus-adapter/values.yaml \
prometheus-community/prometheus-adapter
To see if the external metric we created is available, run the following command:
# See all the available external metrics
kubectl get --raw \
/apis/external.metrics.k8s.io/v1beta1 | \
jq
# See the external metric we created (gitlab_runner_jobs_sum_total)
kubectl get --raw \
/apis/external.metrics.k8s.io/v1beta1/namespaces/default/gitlab_runner_jobs_sum_total | \
jq
The second command should show you the number of jobs run by each runner in the last 5 minutes. You can go to GitLab Runner and trigger 5 jobs to see the metric change.
Example:
{
"kind":Cloud Native CI/CD with GitLab
From Commit to Production ReadyEnroll now to unlock all content and receive all future updates for free.
