DevSecOps

Kubernetes CrashLoopBackOff: Ghid Complet de Depanare

Nicu Constantin
--7 min lectura
#kubernetes#crashloopbackoff#troubleshooting#containers#devops

CrashLoopBackOff este una dintre cele mai frecvente erori de stare ale pod-urilor Kubernetes. Acest ghid acopera abordari sistematice de depanare pentru fiecare scenariu.

Intelegerea CrashLoopBackOff

Pod Lifecycle with CrashLoopBackOff:

Start → Running → Crash → Restart → Running → Crash → ...
                    ↓
           Back-off delay increases:
           10s → 20s → 40s → 80s → 160s → 300s (max)

Pod-ul continua sa reporneasca deoarece containerul se opreste cu o eroare.

Pasul 1: Diagnosticarea initiala

# Get pod status
kubectl get pods -n <namespace>
 
# Describe pod for events and conditions
kubectl describe pod <pod-name> -n <namespace>
 
# Check logs (current container)
kubectl logs <pod-name> -n <namespace>
 
# Check logs (previous crashed container)
kubectl logs <pod-name> -n <namespace> --previous
 
# For multi-container pods
kubectl logs <pod-name> -n <namespace> -c <container-name> --previous

Eroare: OOMKilled (Exit Code 137)

Simptom:

State:          Terminated
Reason:         OOMKilled
Exit Code:      137

Cauza: Containerul a depasit limita de memorie.

Solutia 1 - Creste limitele de memorie:

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"  # Increase this

Solutia 2 - Profileaza utilizarea reala a memoriei:

# Check current usage
kubectl top pod <pod-name> -n <namespace>
 
# View resource requests/limits
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'

Solutia 3 - Remedii la nivel de aplicatie:

# For Node.js
NODE_OPTIONS="--max-old-space-size=384"
 
# For Java
JAVA_OPTS="-Xmx384m -Xms256m"

Eroare: Verificare de sanatate esuata (Liveness/Readiness)

Simptom:

Warning  Unhealthy  Liveness probe failed: HTTP probe failed with statuscode: 503
Warning  Unhealthy  Readiness probe failed: connection refused

Cauza 1 - Probe-ul porneste inainte ca aplicatia sa fie gata:

# Add startup probe for slow-starting apps
startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10
 
# Adjust liveness probe
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 60  # Wait for startup
  periodSeconds: 10
  failureThreshold: 3

Cauza 2 - Port sau cale gresita:

# Verify your app actually listens on this port/path
livenessProbe:
  httpGet:
    path: /healthz      # Must match your app's health endpoint
    port: 8080          # Must match container port
    scheme: HTTP        # or HTTPS

Cauza 3 - Endpoint-ul de sanatate nu este implementat:

// Node.js Express example
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'healthy' });
});
 
app.get('/ready', (req, res) => {
  // Check dependencies (DB, cache, etc.)
  if (dbConnected && cacheConnected) {
    res.status(200).json({ status: 'ready' });
  } else {
    res.status(503).json({ status: 'not ready' });
  }
});

Eroare: ImagePullBackOff / ErrImagePull

Simptom:

Warning  Failed     Failed to pull image "myregistry.com/app:latest"
Warning  Failed     Error: ErrImagePull
Normal   BackOff    Back-off pulling image

Cauza 1 - Imaginea nu exista:

# Verify image exists
docker pull myregistry.com/app:latest
 
# Check exact tag
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].image}'

Cauza 2 - Autentificare la registru privat:

# Create docker registry secret
kubectl create secret docker-registry regcred \
  --docker-server=myregistry.com \
  --docker-username=user \
  --docker-password=pass \
  --docker-email=email@example.com \
  -n <namespace>
# Reference in pod spec
spec:
  imagePullSecrets:
    - name: regcred
  containers:
    - name: app
      image: myregistry.com/app:latest

Cauza 3 - Service account-ul nu are secretul de image pull:

# Add to service account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-service-account
imagePullSecrets:
  - name: regcred

Eroare: ConfigMap sau Secret lipsa

Simptom:

Warning  Failed     Error: configmap "app-config" not found
Warning  Failed     Error: secret "db-credentials" not found

Solutia 1 - Creeaza resursele lipsa:

# Check if ConfigMap exists
kubectl get configmap app-config -n <namespace>
 
# Create ConfigMap
kubectl create configmap app-config \
  --from-file=config.yaml \
  -n <namespace>
 
# Create Secret
kubectl create secret generic db-credentials \
  --from-literal=username=admin \
  --from-literal=password=secret \
  -n <namespace>

Solutia 2 - Fa-le optionale:

env:
  - name: CONFIG_VALUE
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: some-key
        optional: true  # Don't fail if missing

Eroare: Esecuri la montarea volumelor

Simptom:

Warning  FailedMount  Unable to attach or mount volumes
Warning  FailedMount  MountVolume.SetUp failed for volume "pvc-xxx"

Cauza 1 - PVC nu este legat:

# Check PVC status
kubectl get pvc -n <namespace>
 
# Should show "Bound", not "Pending"

Cauza 2 - Probleme cu storage class:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard  # Must exist in cluster
  resources:
    requests:
      storage: 10Gi

Cauza 3 - Sistem de fisiere read-only:

securityContext:
  readOnlyRootFilesystem: true
 
volumeMounts:
  - name: tmp
    mountPath: /tmp  # App needs writable tmp
  - name: cache
    mountPath: /app/cache
 
volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}

Eroare: Probleme cu comanda/entrypoint-ul containerului

Simptom:

Error: failed to start container: OCI runtime create failed
Error: container_linux.go:380: starting container process caused: exec: "myapp": executable file not found

Solutia 1 - Verifica daca comanda exista:

# Debug by running shell
kubectl run debug --rm -it --image=myimage -- /bin/sh
# Then check if binary exists
which myapp
ls -la /app/

Solutia 2 - Verifica suprascrierea entrypoint-ului:

# If Dockerfile has ENTRYPOINT, and you override command:
containers:
  - name: app
    image: myimage
    command: ["/bin/sh"]  # Overrides ENTRYPOINT
    args: ["-c", "/app/start.sh"]  # Overrides CMD

Solutia 3 - Probleme cu scripturile shell:

# Common issue: Windows line endings
# Fix in Dockerfile:
RUN sed -i 's/\r$//' /app/start.sh && chmod +x /app/start.sh

Eroare: Referinta coduri de iesire

| Cod de iesire | Semnificatie | Cauza frecventa | |---------------|-------------|-----------------| | 0 | Succes (dar nu ar trebui sa se opreasca) | Entrypoint lipsa, aplicatia se opreste imediat | | 1 | Eroare generala | Eroare de aplicatie, verifica logurile | | 126 | Permisiune refuzata | Scriptul nu este executabil | | 127 | Comanda nu a fost gasita | Binar lipsa, cale gresita | | 137 | SIGKILL (OOM) | Memorie insuficienta | | 139 | SIGSEGV | Segmentation fault | | 143 | SIGTERM | Terminare gratiosa |

Pattern-ul Debug Container

Cand nu poti accesa logurile:

# Add debug container to running pod
kubectl debug <pod-name> -it --image=busybox --target=<container-name>
 
# Or create debug pod with same config
apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
spec:
  containers:
    - name: debug
      image: busybox
      command: ["sleep", "infinity"]
      volumeMounts:
        # Same mounts as failing pod
        - name: config
          mountPath: /config

Comenzi rapide de depanare

# All-in-one diagnosis
kubectl get pod <pod> -o yaml | grep -A 20 "state:"
 
# Events for namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
 
# Resource usage
kubectl top pods -n <namespace>
 
# Execute into running container
kubectl exec -it <pod> -n <namespace> -- /bin/sh
 
# Port forward to test locally
kubectl port-forward <pod> 8080:8080 -n <namespace>

Checklist de preventie

Inainte de deployment:

  1. Testeaza containerul local cu aceleasi limite de resurse
  2. Implementeaza endpoint-uri de sanatate corespunzatoare
  3. Adauga startup probes pentru aplicatii care pornesc lent
  4. Foloseste init containers pentru dependinte
  5. Seteaza resource requests SI limits
  6. Testeaza cu permisiuni minime (non-root)
  7. Verifica ca toate ConfigMap-urile/Secretele exista
  8. Verifica disponibilitatea storage class

Probleme complexe Kubernetes?

Depanarea problemelor Kubernetes de productie necesita expertiza profunda. Echipa noastra ofera:

  • Raspuns la incidente Kubernetes 24/7
  • Evaluari ale sanatatii clusterului
  • Consultanta pentru optimizarea resurselor
  • Securizare si conformitate

Obtine suport Kubernetes


Sistemul tau AI e conform cu EU AI Act? Evaluare gratuita de risc - afla in 2 minute →

Ai nevoie de ajutor cu conformitatea EU AI Act sau securitatea AI?

Programeaza o consultatie gratuita de 30 de minute. Fara obligatii.

Programeaza un Apel

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.