Cloud Native, Scalable and Observable GitLab Runner on Kubernetes
Using a Local Cache to Optimize Job Performance
When using the Kubernetes executor, cache management is handled through a volume mounted on the pod that runs the job, typically at the /cache directory. This cache volume is crucial for optimizing job performance by storing data that may be reused across different stages of the pipeline or even across different jobs.
Typically, at the start of a job, the GitLab Runner checks the cache volume for any pre-existing cached data. If the cached data is found, it may be in the form of compressed files or directories that the runner can extract and use during the job’s execution. This prevents redundant operations, such as re-downloading dependencies, which can save significant time and resources.
The helper container within the pod is responsible for managing the cache, it ensures that it is available when needed, and saves any new cache data generated during the job back to the cache volume.
Effective cache management can significantly reduce pipeline run times, improve the overall performance of your CI/CD pipeline, reduce the load on your GitLab instance (especially if it is self-hosted), save bandwidth within your cluster and with your cloud provider, and even reduce the cost of operating your CI system if bandwidth and data usage are part of your expenses.
To add a local cache to the GitLab Runner (which in its turn will be mounted to the pod that runs the job), you need to update the configuration of the runner. Here is an example:
[[runners]]
[runners.kubernetes]
# The Kubernetes namespace in which
# the runner pods will be deployed.
# The namespace is dynamically set based on the Helm release.
namespace = "{{.Release.Namespace}}"
# The Docker image to use for executing CI/CD jobs.
# In this case, Python 3.12.
image = "python:3.12"
[[runners.kubernetes.volumes.pvc]]
name = "cache"
mount_path = "/cache"
This configuration tells GitLab runner to use a persistent volume claim (PVC) named cache and mount it to the /cache directory in the pod. We also need to create the PVC in the Kubernetes cluster and add the necessary Kubernetes configurations to enable the Runner Deployment to use the cache volume.
cat < $HOME/todo/gitlab-runner/helm/values.yaml
gitlabUrl: https://gitlab.com/
runnerRegistrationToken: "$GITLAB_RUNNER_TOKEN"
rbac:
create: true
serviceAccount:
create: true
# This is the new configuration for the cache volume
runners:
privileged: true
config: |
[[runners]]
[runners.kubernetes]
namespace = "{{.Release.Namespace}}"
image = "python:3.12"
[[runners.kubernetes.volumes.pvc]]
name = "cache"
mount_path = "/cache"
# Mount the cache volume to the runner pod
volumeMounts:
- name: cache
mountPath: /cache
# Inform Kubernetes about the PVC to use for the cache volume
volumes:
- name: cache
persistentVolumeClaim:
claimName: cache
EOF
Now, create the PVC in the Kubernetes cluster:
kubectl apply -f - <
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cache
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
EOF
If you don't specify the storage class, the default storage class will be used (local-path in the case of K3s). You can check the storage class by running the following command:
kubectl get storageclass
ℹ️ PersistentVolumeClaims (PVCs) are used to request storage resources in a Kubernetes cluster. They are used by PersistentVolumes (PVs) to provide storage resources to pods. While PersistentVolumeClaim is a request for storage by a user, PersistentVolume is a piece of storage in the cluster that has been provisioned by an administrator. The StorageClass, on the other hand, is used by the administrator to define the type of storage that will be provisioned when a PersistentVolumeClaim is created.
After updating the values file and creating the PVC, you can upgrade the GitLab Runner Helm chart to apply the changes:
helm upgrade \
--namespace default \
gitlab-runner \
-f $HOME/todo/gitlab-runner/helm/values.yaml \
gitlab/gitlab-runner
Now, the GitLab Runner will use the cache volume to store and retrieve cached data. However, we need to manage the cache in pipeline and job configurations. We can do this by updating the .gitlab-ci.yml file to use the cache. Here is an example:
cat < $HOME/todo/app/.gitlab-ci.yml
variables:
# PIP_CACHE_DIR is the environment variable that
# pip uses to determine the cache directory.
# We set this variable to use the cache directory
# within the project (therefore within the cache volume).
# The default cache directory is $HOME/.cache/pip.
# which is not persistent across
# different jobs or stages (different pods).
PIP_CACHE_DIR: "\$CI_PROJECT_DIR/.cache/pip"
# Define the stages of the pipeline.
stages:
- build
- test
# Define the build stage
build:
stage: build
script:
- pip install \
-r requirements.txt \
--break-system-packages \
--ignore-installed
cache:
key: dependency-cache-\$CI_COMMIT_REF_NAME
paths:
# Pathes are relative to the project directory
# which is $CI_PROJECT_DIR
- .cache/pip
tags:
- kubernetes
# Define the test stage
test:
stage: test
script:
# Re-install dependencies in the test stage
- pip install \
-r requirements.txt \
--break-system-packages \
--ignore-installed
- pip install flake8==7.1.1 --break-system-packages
- flake8 --statistics
- python3 test_app.py
cache:
key: dependency-cache-\$CI_COMMIT_REF_NAME
paths:
- .cache/pip
tags:
- kubernetes
EOF
In the above file, we tell GitLab to cache the .cache/pip directory in the project directory.
ℹ️ Usually, dependencies are among the most common data to cache. In Python projects, for example, the pip cache is a good candidate for caching. However, this cache is always saved to
$HOME/.cache/pipby default, which is not persistent across different jobs or stages. By setting thePIP_CACHE_DIRenvironment variable to a directory within the project, we can ensure that the cache is saved to the cache volume and is available across different jobs and stages.
The cache will use the key dependency-cache-$CI_COMMIT_REF_NAME. The key is like a name or a unique identifier for a cache entry. When the key changes, the cache is invalidated and a new cache is created. So, for example, if you choose to create a unique cache per branch, you can use:
cache:
key: $CI_COMMIT_REF_NAME
paths:
- .cache/pip
If you want to be more explicit, or if your pipeline has multiple cache entries, you can use a unique key for each cache entry while still using the branch name as part of the key to ensure the cache is invalidated when the branch changes:
cache:
# where `` is a unique identifier for the cache entry.
# This can anything you want.
key: -$CI_COMMIT_REF_NAME
paths:
- .cache/pip
When you need per-branch caching, you can use the CI_COMMIT_REF_NAME variable. Example:
cache:
key: "$CI_COMMIT_REF_NAME"
If you want to use a different key for each job, you can use the CI_JOB_NAME variable. Here is an example:
cache:
key: "$CI_JOB_NAME"
You can also combine the two variables to enable per-job and per-branch caching or per-stage and per-branch caching:
# per-job and per-branch caching
cache:
key: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
# per-stage and per-branch caching
cache:
key: "$CI_JOB_STAGE-$CI_COMMIT_REF_NAME"
ℹ️ Reminder: A list of all the predefined variables in GitLab CI/CD can be found in the official documentation.
Other options for the cache configuration could be used like:
untracked: If set to true, the cache will include untracked files (even if they are listed in .gitignore).
cache:
untracked: true
policy: The cache policy to use. The default is pull-push. Other options are pull, push, and pull-push
Cloud Native CI/CD with GitLab
From Commit to Production ReadyEnroll now to unlock all content and receive all future updates for free.
