Kubernetes CrashLoopBackOff — How to Troubleshoot

0_aR4qcDku5f9pCbMo.png

We have seen about imagepullbackoff error on last article, now let’s take a look on another familiar error on Kubernetes. If you are working on Kubernetes, this could be on the annoying error, you may experience multiple times. The error is nothing but Kubernetes CrashLoopBackOff, it is one of the common errors in Kubernetes, indicating a pod constantly crashing in an endless loop and either unable to get start or fail.

In this post will see how we can identify the cause of the issue and why we are getting CrashLoopBackOff error, and also, we will cover how you can solve this.

Why does CrashLoopBackOff occurs?

The CrashLoopBackOff error can occur due to varies reasons, including:

  • Insufficient resources — lack of resources prevents the container from loading
  • Locked file/database/port — a resource already locked by another container
  • No proper reference/Configuration — reference to scripts or binaries that are not present on the container or any misconfiguration on underlying system such as read-only filesystem
  • Config loading/Setup error — a server cannot load the configuration file or initial setup like init-container failing
  • Connection issues — DNS or kube-DNS is not able to connect to a external services
  • Downstream service — One of the downstream services on which the application relies can’t be reached or the connection fails (database, backend, etc.)
  • Liveness probes– Liveness probes could have misconfigured or probe fails due to any reason.
  • Port already in use: Two or more containers are using the same port, which doesn’t work if they’re from the same Pod

How to Diagnosis CrashLoopBackOff

To troubleshoot any issues, the best way to identify the root cause is to start going through the list of potential causes and check one by one. Let’s say easy on first. Also, another basic requirement is having better understanding of the environment, like what is the configuration, what port it used, is there any mount point, what is the probe configured, etc.

Back Off Restarting Failed Container

For first point to troubleshoot to collect the issue details run kubectl describe pod [name]. Let say you have configured and it is failing due to some reason like Liveness probe failed and Back-off restarting failed container.

If you get the back-off restarting failed container message this means that you are dealing with a temporary resource overload, as a result of an activity spike. The solution is to adjust periodSeconds or timeoutSeconds to give the application a longer window of time to respond.

Check the logs

If the previous step not providing any details or cannot identify, the next step will be pulling more details explanation about what is happening, you can get this from failing pod.

For that run kubectl get pods to identify the pod that was exhibiting the CrashLoopBackOff error. You can run the following command to get the log of the pod:

                kubectl logs PODNAME
            

Try to walkthrough the error, to identify why the pod is repeatedly crashing. This may have some more details from the application running inside the pod, with this you could see any configuration error or any readiness issue like that.

Check Deployment Logs

Run the following command to retrieve the kubectl deployment logs:

                kubectl logs -f deploy/ -n
            

This may also provide clues about issues at the application level. For example, below you can see a log file that shows ./datacan’t be mounted, likely because it’s already in use and locked by a different container.

Resource limit

you may be experiencing CrashLoopBackOff errors due to insufficient memory resources. You can increase the memory limit by changing the “resources:limits” in the Container’s resource manifest.

Issue with image

If still there is a issue, another reason could be the docker image you are using may not working properly, you need to make sure when you run separately it is working fine. If that is working and failing with Kubernetes, you may need to go advance way to find what is happening, try following,

Step 1: Identify entrypoint and cmd

You will need to identify the entrypoint and cmd to gain access to the container for debugging. Do the following:

  1. Run docker pull [image-id] to pull the image.
  2. Run docker inspect [image-id] and locate the entrypoint and cmd for the container image.

Step 2: Change entrypoint

Because the container has crashed and cannot start, you’ll need to temporarily change the entrypoint in the container specification to tail -f /dev/null.

                Spec:
     containers:
      -   command:
           - “tail”
           - “-f”
           - “/dev/null”
            

Step 3: Check for the cause

With the entrypoint changed, you should be able to use the default command line kubectl to execute into the issue container. Once you login the container, check all the possible options and validate all good, if you see any issue fix it.

Step 4: Check for missing packages or dependencies

When you logged in, check if any packages or dependencies are missing, preventing the application from starting. If there are packages or dependencies missing, provide the missing files to the application and see if this resolves the error.

Continue Reading on Kubernetes CrashLoopBackOff — How to Troubleshoot — FoxuTech


Only registered users can post comments. Please, login or signup.

Start blogging about your favorite technologies and get more readers

Join other developers and claim your FAUN account now!

Avatar

We are Foxutech

@foxutech
#devops #docker #github #rundeck #nagios #linux #containers #kubernetes #terraform #ansible #saltstack #security #automation #microservices #gitops #argocd #crossplane #prometheus
Stats
35

Influence

1k

Total Hits

11

Posts