Kubernetes CrashLoopBackOff: Ghid Complet de Depanare

CrashLoopBackOff este una dintre cele mai frecvente erori de stare ale pod-urilor Kubernetes. Acest ghid acopera abordari sistematice de depanare pentru fiecare scenariu.

Intelegerea CrashLoopBackOff

Pod Lifecycle with CrashLoopBackOff:

Start → Running → Crash → Restart → Running → Crash → ...
                    ↓
           Back-off delay increases:
           10s → 20s → 40s → 80s → 160s → 300s (max)

Pod-ul continua sa reporneasca deoarece containerul se opreste cu o eroare.

Pasul 1: Diagnosticarea initiala

# Get pod status
kubectl get pods -n <namespace>
 
# Describe pod for events and conditions
kubectl describe pod <pod-name> -n <namespace>
 
# Check logs (current container)
kubectl logs <pod-name> -n <namespace>
 
# Check logs (previous crashed container)
kubectl logs <pod-name> -n <namespace> --previous
 
# For multi-container pods
kubectl logs <pod-name> -n <namespace> -c <container-name> --previous

Eroare: OOMKilled (Exit Code 137)

Simptom:

State:          Terminated
Reason:         OOMKilled
Exit Code:      137

Cauza: Containerul a depasit limita de memorie.

Solutia 1 - Creste limitele de memorie:

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"  # Increase this

Solutia 2 - Profileaza utilizarea reala a memoriei:

# Check current usage
kubectl top pod <pod-name> -n <namespace>
 
# View resource requests/limits
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'

Solutia 3 - Remedii la nivel de aplicatie:

# For Node.js
NODE_OPTIONS="--max-old-space-size=384"
 
# For Java
JAVA_OPTS="-Xmx384m -Xms256m"

Eroare: Verificare de sanatate esuata (Liveness/Readiness)

Simptom:

Warning  Unhealthy  Liveness probe failed: HTTP probe failed with statuscode: 503
Warning  Unhealthy  Readiness probe failed: connection refused

Cauza 1 - Probe-ul porneste inainte ca aplicatia sa fie gata:

# Add startup probe for slow-starting apps
startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10
 
# Adjust liveness probe
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 60  # Wait for startup
  periodSeconds: 10
  failureThreshold: 3

Cauza 2 - Port sau cale gresita:

# Verify your app actually listens on this port/path
livenessProbe:
  httpGet:
    path: /healthz      # Must match your app's health endpoint
    port: 8080          # Must match container port
    scheme: HTTP        # or HTTPS

Cauza 3 - Endpoint-ul de sanatate nu este implementat:

// Node.js Express example
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'healthy' });
});
 
app.get('/ready', (req, res) => {
  // Check dependencies (DB, cache, etc.)
  if (dbConnected && cacheConnected) {
    res.status(200).json({ status: 'ready' });
  } else {
    res.status(503).json({ status: 'not ready' });
  }
});

Eroare: ImagePullBackOff / ErrImagePull

Simptom:

Warning  Failed     Failed to pull image "myregistry.com/app:latest"
Warning  Failed     Error: ErrImagePull
Normal   BackOff    Back-off pulling image

Cauza 1 - Imaginea nu exista:

# Verify image exists
docker pull myregistry.com/app:latest
 
# Check exact tag
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].image}'

Cauza 2 - Autentificare la registru privat:

# Create docker registry secret
kubectl create secret docker-registry regcred \
  --docker-server=myregistry.com \
  --docker-username=user \
  --docker-password=pass \
  --docker-email=email@example.com \
  -n <namespace>

# Reference in pod spec
spec:
  imagePullSecrets:
    - name: regcred
  containers:
    - name: app
      image: myregistry.com/app:latest

Cauza 3 - Service account-ul nu are secretul de image pull:

# Add to service account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-service-account
imagePullSecrets:
  - name: regcred

Eroare: ConfigMap sau Secret lipsa

Simptom:

Warning  Failed     Error: configmap "app-config" not found
Warning  Failed     Error: secret "db-credentials" not found

Solutia 1 - Creeaza resursele lipsa:

# Check if ConfigMap exists
kubectl get configmap app-config -n <namespace>
 
# Create ConfigMap
kubectl create configmap app-config \
  --from-file=config.yaml \
  -n <namespace>
 
# Create Secret
kubectl create secret generic db-credentials \
  --from-literal=username=admin \
  --from-literal=password=secret \
  -n <namespace>

Solutia 2 - Fa-le optionale:

env:
  - name: CONFIG_VALUE
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: some-key
        optional: true  # Don't fail if missing

Eroare: Esecuri la montarea volumelor

Simptom:

Warning  FailedMount  Unable to attach or mount volumes
Warning  FailedMount  MountVolume.SetUp failed for volume "pvc-xxx"

Cauza 1 - PVC nu este legat:

# Check PVC status
kubectl get pvc -n <namespace>
 
# Should show "Bound", not "Pending"

Cauza 2 - Probleme cu storage class:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard  # Must exist in cluster
  resources:
    requests:
      storage: 10Gi

Cauza 3 - Sistem de fisiere read-only:

securityContext:
  readOnlyRootFilesystem: true
 
volumeMounts:
  - name: tmp
    mountPath: /tmp  # App needs writable tmp
  - name: cache
    mountPath: /app/cache
 
volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}

Eroare: Probleme cu comanda/entrypoint-ul containerului

Simptom:

Error: failed to start container: OCI runtime create failed
Error: container_linux.go:380: starting container process caused: exec: "myapp": executable file not found

Solutia 1 - Verifica daca comanda exista:

# Debug by running shell
kubectl run debug --rm -it --image=myimage -- /bin/sh
# Then check if binary exists
which myapp
ls -la /app/

Solutia 2 - Verifica suprascrierea entrypoint-ului:

# If Dockerfile has ENTRYPOINT, and you override command:
containers:
  - name: app
    image: myimage
    command: ["/bin/sh"]  # Overrides ENTRYPOINT
    args: ["-c", "/app/start.sh"]  # Overrides CMD

Solutia 3 - Probleme cu scripturile shell:

# Common issue: Windows line endings
# Fix in Dockerfile:
RUN sed -i 's/\r$//' /app/start.sh && chmod +x /app/start.sh

Eroare: Referinta coduri de iesire

| Cod de iesire | Semnificatie | Cauza frecventa | |---------------|-------------|-----------------| | 0 | Succes (dar nu ar trebui sa se opreasca) | Entrypoint lipsa, aplicatia se opreste imediat | | 1 | Eroare generala | Eroare de aplicatie, verifica logurile | | 126 | Permisiune refuzata | Scriptul nu este executabil | | 127 | Comanda nu a fost gasita | Binar lipsa, cale gresita | | 137 | SIGKILL (OOM) | Memorie insuficienta | | 139 | SIGSEGV | Segmentation fault | | 143 | SIGTERM | Terminare gratiosa |

Pattern-ul Debug Container

Cand nu poti accesa logurile:

# Add debug container to running pod
kubectl debug <pod-name> -it --image=busybox --target=<container-name>
 
# Or create debug pod with same config
apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
spec:
  containers:
    - name: debug
      image: busybox
      command: ["sleep", "infinity"]
      volumeMounts:
        # Same mounts as failing pod
        - name: config
          mountPath: /config

Comenzi rapide de depanare

# All-in-one diagnosis
kubectl get pod <pod> -o yaml | grep -A 20 "state:"
 
# Events for namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
 
# Resource usage
kubectl top pods -n <namespace>
 
# Execute into running container
kubectl exec -it <pod> -n <namespace> -- /bin/sh
 
# Port forward to test locally
kubectl port-forward <pod> 8080:8080 -n <namespace>

Checklist de preventie

Inainte de deployment:

Testeaza containerul local cu aceleasi limite de resurse
Implementeaza endpoint-uri de sanatate corespunzatoare
Adauga startup probes pentru aplicatii care pornesc lent
Foloseste init containers pentru dependinte
Seteaza resource requests SI limits
Testeaza cu permisiuni minime (non-root)
Verifica ca toate ConfigMap-urile/Secretele exista
Verifica disponibilitatea storage class

Probleme complexe Kubernetes?

Depanarea problemelor Kubernetes de productie necesita expertiza profunda. Echipa noastra ofera:

Raspuns la incidente Kubernetes 24/7
Evaluari ale sanatatii clusterului
Consultanta pentru optimizarea resurselor
Securizare si conformitate

Obtine suport Kubernetes

Sistemul tau AI e conform cu EU AI Act? Evaluare gratuita de risc - afla in 2 minute →

Kubernetes CrashLoopBackOff: Ghid Complet de Depanare

Intelegerea CrashLoopBackOff

Pasul 1: Diagnosticarea initiala

Eroare: OOMKilled (Exit Code 137)

Eroare: Verificare de sanatate esuata (Liveness/Readiness)

Eroare: ImagePullBackOff / ErrImagePull

Eroare: ConfigMap sau Secret lipsa

Eroare: Esecuri la montarea volumelor

Eroare: Probleme cu comanda/entrypoint-ul containerului

Eroare: Referinta coduri de iesire

Pattern-ul Debug Container

Comenzi rapide de depanare

Checklist de preventie

Probleme complexe Kubernetes?

Weekly AI Security & Automation Digest

Related Articles

Erori de Build Docker: Probleme Frecvente si Cum sa le Rezolvi

GitHub Actions: Erori Frecvente in CI/CD si Cum sa le Rezolvi

Erori de State Lock in Terraform: Cum sa le Rezolvi si Previi