Autoscaling GitLab Runners

Cloud Native, Scalable and Observable GitLab Runner on Kubernetes

62%

Autoscaling in Kubernetes can be done using different mechanisms. One of the most common ways is to use the Horizontal Pod Autoscaler (HPA). The HPA automatically scales the number of pods in a deployment based on different types of metrics like:

CPU utilization and memory usage which are resource metrics that are collected by the Kubernetes metrics server by default.
Custom metrics which are metrics other than the common CPU and memory metrics. These metrics can be typical application metrics like the number of requests per second, the number of jobs run by a runner, etc. These metrics are collected by Prometheus and exposed to the Kubernetes API server using the Prometheus Adapter.

ℹ️ The Prometheus Adapter is a component that allows Kubernetes to use Prometheus as a source for custom metrics. It translates custom metrics from Prometheus into a format that can be consumed by the Kubernetes API server. This allows your HPA to consume more meaningful metrics than just CPU and memory usage.

So, to autoscale the GitLab Runner, we need to create a custom metric then configure the HPA to use this metric. If you just want to use CPU or memory usage (of the runner pods) as a metric, you don't need to go through this process because these metrics are collected by default by the Kubernetes metrics server. However, both metrics are not really useful for autoscaling GitLab Runner. There are indeed better metrics to use like the number of jobs run by the runner. You can view this metric by running the following command:

# export the pod name of the runner
pod=$( \
kubectl get pods \
-l app=gitlab-runner \
-o jsonpath='{.items[0].metadata.name}'\
)
# list the metrics and filter the gitlab_runner_jobs metric
kubectl exec -it $pod -- curl localhost:9252/metrics | \
grep gitlab_runner_jobs

This metric is a gauge that represents the number of jobs run by the runner at a given time. We can use this metric to identify the number of jobs run by the runner during a specific time window and if the number exceeds a certain threshold, we can scale the runner horizontally by adding more pods.

First, we need to install the Prometheus Adapter:

# Add the Prometheus community Helm repository
helm repo add \
prometheus-community \
https://prometheus-community.github.io/helm-charts

# Update the Helm repositories
helm repo update

# Install the Prometheus Adapter
helm install \
prometheus-adapter \
prometheus-community/prometheus-adapter \
--namespace monitoring \
--create-namespace

Now, we need to create a custom metric for the number of jobs run by the runner during a specific time window. We can do this by updating the Prometheus Adapter values file. Create the file and add the necessary configurations by executing the following command:

mkdir -p $HOME/todo/prometheus-adapter && \
cat <<'EOT' > $HOME/todo/prometheus-adapter/values.yaml
# Prometheus Adapter should be configured to 
# scrape metrics from right Prometheus instance
prometheus:
  # The URL of the Prometheus server that 
  # the Prometheus Adapter will query for metrics.
  # This URL points to the Prometheus service within 
  # the 'monitoring' namespace.
  # This allows the adapter to send queries to Prometheus 
  # to retrieve metrics.
  url: http://prometheus-operator-kube-p-prometheus.monitoring.svc
  # The port on which Prometheus listens. 
  # Port 9090 is the default for Prometheus.
  port: 9090
  # The path is left empty as the root path ("/") 
  # is used to query Prometheus.
  # This is the default endpoint where Prometheus 
  # serves its API.
  path: ""

# Section for defining custom rules for how metrics 
# should be gathered from Prometheus and exposed via 
# the Prometheus Adapter as external metrics.
rules:
  external:    
    # The `seriesQuery` field defines the metric series 
    # that the Prometheus Adapter will query.
    # Here, it matches the exact name `gitlab_runner_jobs` 
    # as it is exposed by Prometheus.
    - seriesQuery: '{__name__="gitlab_runner_jobs"}'            
      # The `resources` field governs how to associate 
      # the metric with Kubernetes resources.
      # This association allows the metric to be tied to 
      # specific Kubernetes objects like pods, namespaces, etc.
      resources:
        # Directly map the label names used in Prometheus metrics 
        # to Kubernetes resources.
        # This approach makes it clear which labels correspond 
        # to which Kubernetes resources.
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}        

      # The `name` field specifies how the metric name should 
      # be matched and renamed.
      name:
        # `matches` specifies the metric name that 
        # is being exposed in Prometheus.
        matches: "gitlab_runner_jobs"
        # `as` renames the metric for use in Kubernetes. 
        # This name will be used when querying the custom metrics API.
        as: "gitlab_runner_jobs_sum_total"

      # `metricsQuery` defines the query that Prometheus Adapter 
      # will execute against Prometheus to fetch the metric.
      # This is a way to transform the metric into a format 
      # that can be consumed by the Kubernetes API server.    
      metricsQuery: 'sum(sum_over_time(gitlab_runner_jobs[5m])) by (runner)'
EOT

Note that the metricsQuery field is a PromQL query that calculates the sum of the gitlab_runner_jobs metric over a 5-minute window and groups the metric by the runner label. The result is a metric that represents the total number of jobs run by each runner in the last 5 minutes.

Here is an example of the output of the metricsQuery field:

{runner="xK5NKzMpb"} 6

6 is the total number of jobs run by the runner xK5NKzMpb in the last 5 minutes. If you have multiple runners, you will see multiple lines in the output. The Prometheus Adapter will expose this metric to the Kubernetes API server as gitlab_runner_jobs_sum_total.

Let's upgrade the Prometheus Adapter to apply the changes:

helm upgrade \
--namespace monitoring \
prometheus-adapter \
-f $HOME/todo/prometheus-adapter/values.yaml \
prometheus-community/prometheus-adapter

To see if the external metric we created is available, run the following command:

# See all the available external metrics
kubectl get --raw \
/apis/external.metrics.k8s.io/v1beta1 | \
jq

# See the external metric we created (gitlab_runner_jobs_sum_total)
kubectl get --raw \
/apis/external.metrics.k8s.io/v1beta1/namespaces/default/gitlab_runner_jobs_sum_total | \
jq

The second command should show you the number of jobs run by each runner in the last 5 minutes. You can go to GitLab Runner and trigger 5 jobs to see the metric change.

Example:

Cloud Native CI/CD with GitLab

From Commit to Production Ready

Enroll now to unlock all content and receive all future updates for free.

Unlock now $29.99 Learn More

Previous Next