Monitoring GitLab Runners

Cloud Native GitLab Runners on Kubernetes: Autoscaling and Observability

90%

Most tools and services that are part of the cloud-native ecosystem provide monitoring capabilities to help you track the performance and health of your applications and infrastructure. GitLab Runner is no exception as it's Prometheus-compatible like many other cloud-native tools.

By default, GitLab Runner exposes Prometheus metrics on the /metrics endpoint, which can be scraped by a Prometheus server. To enable monitoring, you need to configure Prometheus to scrape the runner and the runner to expose metrics if it is not already enabled.

ℹ️ Prometheus is an open-source monitoring and alerting toolkit that is widely used in the cloud-native ecosystem. It collects metrics from various sources, stores them in a time-series database, and provides a query language to analyze the data and create alerts based on predefined rules.

Prometheus can be installed as a standalone service or as part of a Kubernetes cluster. However, it's more common to use it to monitor Kubernetes clusters and containerized applications running on Kubernetes. In this section, we will focus on monitoring GitLab Runner using Prometheus in a Kubernetes environment. You can apply similar steps to other environments like standalone servers or cloud instances with minor modifications - the core concepts remain the same.

This section assumes you have the K3s cluster seen in the previous sections set up and running. If you don't have a K3s cluster, please refer to the previous section.

Since Prometheus will be deployed to the namespace monitoring, let's prepare this at the beginning:

# Create the monitoring namespace
kubectl create namespace monitoring

Back to our GitLab Runner, we need to expose Prometheus metrics so Prometheus can scrape them. The GitLab Runner Helm chart supports this directly, and the recommended way is to use a PodMonitor. A PodMonitor scrapes metrics straight from runner pods, which avoids relying on a Service, and it keeps scraping working during shutdown periods when pods can flip to NotReady.

ℹ️ In previous versions of the GitLab Runner Helm chart, the recommended way to expose metrics was to use a ServiceMonitor. However, the recommendation has changed to use PodMonitor instead. The reason for this change is that PodMonitor provides a more direct and reliable way to scrape metrics from pods, especially during shutdown periods when pods can flip to NotReady.

Make sure you exported the runner token as an environment variable (GITLAB_RUNNER_TOKEN), then run the following command to update the Helm values file:

cat < $HOME/todo/gitlab-runner/helm/values.yaml
# We already have the configuration
# for the GitLab Runner in the values file.
gitlabUrl: https://gitlab.com/
runnerToken: "$GITLAB_RUNNER_TOKEN"
rbac:
  create: true
serviceAccount:
  create: true
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        namespace = "{{.Release.Namespace}}"
        image = "python:3.12"
  privileged: true

# This is what we're updating
# Metrics configuration.
metrics:
  # Enable gitlab-runner metrics
  enabled: true

  # RECOMMENDED: PodMonitor is the preferred method.
  # It scrapes metrics directly from pods and does not
  # require a ServiceMonitor/Service for discovery.
  podMonitor:
    # Deploy a PodMonitor in the monitoring namespace
    namespace: monitoring
    # Enable pod monitor
    enabled: true
    labels:
      app: gitlab-runner
      # IMPORTANT notes:
      # 1. Adding the release is often required depending on how
      # Prometheus Operator is configured to select PodMonitors.
      # 2. The release label commonly needs to match the Helm
      # release name of the Prometheus Operator stack.
      release: prometheus-operator

# Keep the Service enabled since it can still be useful
# (for example, for direct access to the metrics endpoint),
# even though PodMonitor does not require it.
service:
  enabled: true
  type: ClusterIP
EOF

As you can see, we enabled metrics and configured a PodMonitor in the monitoring namespace. We also added labels to the PodMonitor so it can be selected by the Prometheus Operator once it is installed.

This can be confusing at first because we have not installed Prometheus yet. That is fine. We are preparing the runner chart to create the monitoring resource upfront. In the next steps, when we install the Prometheus Operator stack, we will align its selector configuration with these labels. I will come back to the release label later, because whether it is required depends on how Prometheus is configured to discover PodMonitor resources.

ℹ️ The PodMonitor resource is part of the Prometheus Operator. It defines which pods should be monitored by Prometheus. The release: prometheus-operator label is commonly needed because many Prometheus Operator Helm installs use label selectors (for example, podMonitorSelector) to decide which PodMonitors to watch. The label value must match whatever selector the Prometheus deployment uses, which is often derived from the Prometheus Operator Helm release name.

Now, you can upgrade the GitLab Runner Helm chart to apply the changes:

helm upgrade \
    --namespace default \
    gitlab-runner \
    -f $HOME/todo/gitlab-runner/helm/values.yaml \
    gitlab/gitlab-runner

If everything is set up correctly, you should see Metrics server listening address=:9252 in the logs of the runner pod when you run the following command:

kubectl logs -l app=gitlab-runner

This means that the runner is now exposing metrics on ":9252/metrics".

If you don't see the same message (because the logs were too long), you may see other messages that indicate the metrics are enabled (like Job Succeded, Updating Job, etc.).

ℹ️ By default, Prometheus scrapes metrics from the /metrics endpoint. The port number can be different depending on the configuration.

You can see these metrics by running the following command (wait a few seconds after the upgrade to let the runner pod start):

# Find the pod name of the runner by label
# You should wait a few seconds after the upgrade
# to let the runner pod start.
pod=$( \
  kubectl get pods \
  -l app=gitlab-runner \
  -o jsonpath='{.items[0].metadata.name}'\
)

# List the metrics
# If you don't see any metrics,
# the pod may not be ready yet.
# re-export the pod variable and run the command again.
kubectl exec -it $pod -- curl localhost:9252/metrics

This will show you the metrics that Prometheus will scrape. Example:

# HELP gitlab_runner_api_request_duration_seconds Latency histogram of API requests made by GitLab Runner
# TYPE gitlab_runner_api_request_duration_seconds histogram
gitlab_runner_api_request_duration_seconds_bucket{endpoint="request_job",runner="xK5NKzMpb",system_id="r_l6DoyLeuePMR",le="0.1"} 0
gitlab_runner_api_request_duration_seconds_bucket{endpoint="request_job",runner="xK5NKzMpb",system_id="r_l6DoyLeuePMR",le="0.25"

Cloud Native CI/CD with GitLab

From Commit to Production Ready

Enroll now to unlock all content and receive all future updates for free.

Unlock now $29.99 Learn More

Previous Next