DevSecOps

Kubernetes CrashLoopBackOff: Complete Troubleshooting Guide

DeviDevs Team
6 min read
#kubernetes#crashloopbackoff#troubleshooting#containers#devops

CrashLoopBackOff is one of the most common Kubernetes pod status errors. This guide covers systematic debugging approaches for every scenario.

Understanding CrashLoopBackOff

Pod Lifecycle with CrashLoopBackOff:

Start → Running → Crash → Restart → Running → Crash → ...
                    ↓
           Back-off delay increases:
           10s → 20s → 40s → 80s → 160s → 300s (max)

The pod keeps restarting because the container exits with an error.

Step 1: Initial Diagnosis

# Get pod status
kubectl get pods -n <namespace>
 
# Describe pod for events and conditions
kubectl describe pod <pod-name> -n <namespace>
 
# Check logs (current container)
kubectl logs <pod-name> -n <namespace>
 
# Check logs (previous crashed container)
kubectl logs <pod-name> -n <namespace> --previous
 
# For multi-container pods
kubectl logs <pod-name> -n <namespace> -c <container-name> --previous

Error: OOMKilled (Exit Code 137)

Symptom:

State:          Terminated
Reason:         OOMKilled
Exit Code:      137

Cause: Container exceeded memory limit.

Solution 1 - Increase memory limits:

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"  # Increase this

Solution 2 - Profile actual memory usage:

# Check current usage
kubectl top pod <pod-name> -n <namespace>
 
# View resource requests/limits
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'

Solution 3 - Application-level fixes:

# For Node.js
NODE_OPTIONS="--max-old-space-size=384"
 
# For Java
JAVA_OPTS="-Xmx384m -Xms256m"

Error: Failed Health Check (Liveness/Readiness)

Symptom:

Warning  Unhealthy  Liveness probe failed: HTTP probe failed with statuscode: 503
Warning  Unhealthy  Readiness probe failed: connection refused

Cause 1 - Probe starts before app is ready:

# Add startup probe for slow-starting apps
startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10
 
# Adjust liveness probe
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 60  # Wait for startup
  periodSeconds: 10
  failureThreshold: 3

Cause 2 - Wrong port or path:

# Verify your app actually listens on this port/path
livenessProbe:
  httpGet:
    path: /healthz      # Must match your app's health endpoint
    port: 8080          # Must match container port
    scheme: HTTP        # or HTTPS

Cause 3 - App health endpoint not implemented:

// Node.js Express example
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'healthy' });
});
 
app.get('/ready', (req, res) => {
  // Check dependencies (DB, cache, etc.)
  if (dbConnected && cacheConnected) {
    res.status(200).json({ status: 'ready' });
  } else {
    res.status(503).json({ status: 'not ready' });
  }
});

Error: ImagePullBackOff / ErrImagePull

Symptom:

Warning  Failed     Failed to pull image "myregistry.com/app:latest"
Warning  Failed     Error: ErrImagePull
Normal   BackOff    Back-off pulling image

Cause 1 - Image doesn't exist:

# Verify image exists
docker pull myregistry.com/app:latest
 
# Check exact tag
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].image}'

Cause 2 - Private registry authentication:

# Create docker registry secret
kubectl create secret docker-registry regcred \
  --docker-server=myregistry.com \
  --docker-username=user \
  --docker-password=pass \
  --docker-email=email@example.com \
  -n <namespace>
# Reference in pod spec
spec:
  imagePullSecrets:
    - name: regcred
  containers:
    - name: app
      image: myregistry.com/app:latest

Cause 3 - Service account doesn't have image pull secret:

# Add to service account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-service-account
imagePullSecrets:
  - name: regcred

Error: Missing ConfigMap or Secret

Symptom:

Warning  Failed     Error: configmap "app-config" not found
Warning  Failed     Error: secret "db-credentials" not found

Solution 1 - Create missing resources:

# Check if ConfigMap exists
kubectl get configmap app-config -n <namespace>
 
# Create ConfigMap
kubectl create configmap app-config \
  --from-file=config.yaml \
  -n <namespace>
 
# Create Secret
kubectl create secret generic db-credentials \
  --from-literal=username=admin \
  --from-literal=password=secret \
  -n <namespace>

Solution 2 - Make optional:

env:
  - name: CONFIG_VALUE
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: some-key
        optional: true  # Don't fail if missing

Error: Volume Mount Failures

Symptom:

Warning  FailedMount  Unable to attach or mount volumes
Warning  FailedMount  MountVolume.SetUp failed for volume "pvc-xxx"

Cause 1 - PVC not bound:

# Check PVC status
kubectl get pvc -n <namespace>
 
# Should show "Bound", not "Pending"

Cause 2 - Storage class issues:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard  # Must exist in cluster
  resources:
    requests:
      storage: 10Gi

Cause 3 - Read-only filesystem:

securityContext:
  readOnlyRootFilesystem: true
 
volumeMounts:
  - name: tmp
    mountPath: /tmp  # App needs writable tmp
  - name: cache
    mountPath: /app/cache
 
volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}

Error: Container Command/Entrypoint Issues

Symptom:

Error: failed to start container: OCI runtime create failed
Error: container_linux.go:380: starting container process caused: exec: "myapp": executable file not found

Solution 1 - Verify command exists:

# Debug by running shell
kubectl run debug --rm -it --image=myimage -- /bin/sh
# Then check if binary exists
which myapp
ls -la /app/

Solution 2 - Check entrypoint override:

# If Dockerfile has ENTRYPOINT, and you override command:
containers:
  - name: app
    image: myimage
    command: ["/bin/sh"]  # Overrides ENTRYPOINT
    args: ["-c", "/app/start.sh"]  # Overrides CMD

Solution 3 - Shell script issues:

# Common issue: Windows line endings
# Fix in Dockerfile:
RUN sed -i 's/\r$//' /app/start.sh && chmod +x /app/start.sh

Error: Exit Code Reference

| Exit Code | Meaning | Common Cause | |-----------|---------|--------------| | 0 | Success (but shouldn't exit) | Missing entrypoint, app exits immediately | | 1 | General error | Application error, check logs | | 126 | Permission denied | Script not executable | | 127 | Command not found | Binary missing, wrong path | | 137 | SIGKILL (OOM) | Out of memory | | 139 | SIGSEGV | Segmentation fault | | 143 | SIGTERM | Graceful termination |

Debug Container Pattern

When you can't access logs:

# Add debug container to running pod
kubectl debug <pod-name> -it --image=busybox --target=<container-name>
 
# Or create debug pod with same config
apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
spec:
  containers:
    - name: debug
      image: busybox
      command: ["sleep", "infinity"]
      volumeMounts:
        # Same mounts as failing pod
        - name: config
          mountPath: /config

Quick Debug Commands

# All-in-one diagnosis
kubectl get pod <pod> -o yaml | grep -A 20 "state:"
 
# Events for namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
 
# Resource usage
kubectl top pods -n <namespace>
 
# Execute into running container
kubectl exec -it <pod> -n <namespace> -- /bin/sh
 
# Port forward to test locally
kubectl port-forward <pod> 8080:8080 -n <namespace>

Prevention Checklist

Before deploying:

  1. Test container locally with same resource limits
  2. Implement proper health endpoints
  3. Add startup probes for slow-starting apps
  4. Use init containers for dependencies
  5. Set resource requests AND limits
  6. Test with minimal permissions (non-root)
  7. Verify all ConfigMaps/Secrets exist
  8. Check storage class availability

Complex Kubernetes Issues?

Debugging production Kubernetes issues requires deep expertise. Our team offers:

  • 24/7 Kubernetes incident response
  • Cluster health assessments
  • Resource optimization consulting
  • Security hardening and compliance

Get Kubernetes support

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.