CrashLoopBackOff is one of the most common Kubernetes pod status errors. This guide covers systematic debugging approaches for every scenario.
Understanding CrashLoopBackOff
Pod Lifecycle with CrashLoopBackOff:
Start → Running → Crash → Restart → Running → Crash → ...
↓
Back-off delay increases:
10s → 20s → 40s → 80s → 160s → 300s (max)
The pod keeps restarting because the container exits with an error.
Step 1: Initial Diagnosis
# Get pod status
kubectl get pods -n <namespace>
# Describe pod for events and conditions
kubectl describe pod <pod-name> -n <namespace>
# Check logs (current container)
kubectl logs <pod-name> -n <namespace>
# Check logs (previous crashed container)
kubectl logs <pod-name> -n <namespace> --previous
# For multi-container pods
kubectl logs <pod-name> -n <namespace> -c <container-name> --previousError: OOMKilled (Exit Code 137)
Symptom:
State: Terminated
Reason: OOMKilled
Exit Code: 137
Cause: Container exceeded memory limit.
Solution 1 - Increase memory limits:
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi" # Increase thisSolution 2 - Profile actual memory usage:
# Check current usage
kubectl top pod <pod-name> -n <namespace>
# View resource requests/limits
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'Solution 3 - Application-level fixes:
# For Node.js
NODE_OPTIONS="--max-old-space-size=384"
# For Java
JAVA_OPTS="-Xmx384m -Xms256m"Error: Failed Health Check (Liveness/Readiness)
Symptom:
Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy Readiness probe failed: connection refused
Cause 1 - Probe starts before app is ready:
# Add startup probe for slow-starting apps
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10
# Adjust liveness probe
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60 # Wait for startup
periodSeconds: 10
failureThreshold: 3Cause 2 - Wrong port or path:
# Verify your app actually listens on this port/path
livenessProbe:
httpGet:
path: /healthz # Must match your app's health endpoint
port: 8080 # Must match container port
scheme: HTTP # or HTTPSCause 3 - App health endpoint not implemented:
// Node.js Express example
app.get('/health', (req, res) => {
res.status(200).json({ status: 'healthy' });
});
app.get('/ready', (req, res) => {
// Check dependencies (DB, cache, etc.)
if (dbConnected && cacheConnected) {
res.status(200).json({ status: 'ready' });
} else {
res.status(503).json({ status: 'not ready' });
}
});Error: ImagePullBackOff / ErrImagePull
Symptom:
Warning Failed Failed to pull image "myregistry.com/app:latest"
Warning Failed Error: ErrImagePull
Normal BackOff Back-off pulling image
Cause 1 - Image doesn't exist:
# Verify image exists
docker pull myregistry.com/app:latest
# Check exact tag
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].image}'Cause 2 - Private registry authentication:
# Create docker registry secret
kubectl create secret docker-registry regcred \
--docker-server=myregistry.com \
--docker-username=user \
--docker-password=pass \
--docker-email=email@example.com \
-n <namespace># Reference in pod spec
spec:
imagePullSecrets:
- name: regcred
containers:
- name: app
image: myregistry.com/app:latestCause 3 - Service account doesn't have image pull secret:
# Add to service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-service-account
imagePullSecrets:
- name: regcredError: Missing ConfigMap or Secret
Symptom:
Warning Failed Error: configmap "app-config" not found
Warning Failed Error: secret "db-credentials" not found
Solution 1 - Create missing resources:
# Check if ConfigMap exists
kubectl get configmap app-config -n <namespace>
# Create ConfigMap
kubectl create configmap app-config \
--from-file=config.yaml \
-n <namespace>
# Create Secret
kubectl create secret generic db-credentials \
--from-literal=username=admin \
--from-literal=password=secret \
-n <namespace>Solution 2 - Make optional:
env:
- name: CONFIG_VALUE
valueFrom:
configMapKeyRef:
name: app-config
key: some-key
optional: true # Don't fail if missingError: Volume Mount Failures
Symptom:
Warning FailedMount Unable to attach or mount volumes
Warning FailedMount MountVolume.SetUp failed for volume "pvc-xxx"
Cause 1 - PVC not bound:
# Check PVC status
kubectl get pvc -n <namespace>
# Should show "Bound", not "Pending"Cause 2 - Storage class issues:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard # Must exist in cluster
resources:
requests:
storage: 10GiCause 3 - Read-only filesystem:
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp # App needs writable tmp
- name: cache
mountPath: /app/cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}Error: Container Command/Entrypoint Issues
Symptom:
Error: failed to start container: OCI runtime create failed
Error: container_linux.go:380: starting container process caused: exec: "myapp": executable file not found
Solution 1 - Verify command exists:
# Debug by running shell
kubectl run debug --rm -it --image=myimage -- /bin/sh
# Then check if binary exists
which myapp
ls -la /app/Solution 2 - Check entrypoint override:
# If Dockerfile has ENTRYPOINT, and you override command:
containers:
- name: app
image: myimage
command: ["/bin/sh"] # Overrides ENTRYPOINT
args: ["-c", "/app/start.sh"] # Overrides CMDSolution 3 - Shell script issues:
# Common issue: Windows line endings
# Fix in Dockerfile:
RUN sed -i 's/\r$//' /app/start.sh && chmod +x /app/start.shError: Exit Code Reference
| Exit Code | Meaning | Common Cause | |-----------|---------|--------------| | 0 | Success (but shouldn't exit) | Missing entrypoint, app exits immediately | | 1 | General error | Application error, check logs | | 126 | Permission denied | Script not executable | | 127 | Command not found | Binary missing, wrong path | | 137 | SIGKILL (OOM) | Out of memory | | 139 | SIGSEGV | Segmentation fault | | 143 | SIGTERM | Graceful termination |
Debug Container Pattern
When you can't access logs:
# Add debug container to running pod
kubectl debug <pod-name> -it --image=busybox --target=<container-name>
# Or create debug pod with same config
apiVersion: v1
kind: Pod
metadata:
name: debug-pod
spec:
containers:
- name: debug
image: busybox
command: ["sleep", "infinity"]
volumeMounts:
# Same mounts as failing pod
- name: config
mountPath: /configQuick Debug Commands
# All-in-one diagnosis
kubectl get pod <pod> -o yaml | grep -A 20 "state:"
# Events for namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Resource usage
kubectl top pods -n <namespace>
# Execute into running container
kubectl exec -it <pod> -n <namespace> -- /bin/sh
# Port forward to test locally
kubectl port-forward <pod> 8080:8080 -n <namespace>Prevention Checklist
Before deploying:
- Test container locally with same resource limits
- Implement proper health endpoints
- Add startup probes for slow-starting apps
- Use init containers for dependencies
- Set resource requests AND limits
- Test with minimal permissions (non-root)
- Verify all ConfigMaps/Secrets exist
- Check storage class availability
Complex Kubernetes Issues?
Debugging production Kubernetes issues requires deep expertise. Our team offers:
- 24/7 Kubernetes incident response
- Cluster health assessments
- Resource optimization consulting
- Security hardening and compliance