CrashLoopBackOff este una dintre cele mai frecvente erori de stare ale pod-urilor Kubernetes. Acest ghid acopera abordari sistematice de depanare pentru fiecare scenariu.
Intelegerea CrashLoopBackOff
Pod Lifecycle with CrashLoopBackOff:
Start → Running → Crash → Restart → Running → Crash → ...
↓
Back-off delay increases:
10s → 20s → 40s → 80s → 160s → 300s (max)
Pod-ul continua sa reporneasca deoarece containerul se opreste cu o eroare.
Pasul 1: Diagnosticarea initiala
# Get pod status
kubectl get pods -n <namespace>
# Describe pod for events and conditions
kubectl describe pod <pod-name> -n <namespace>
# Check logs (current container)
kubectl logs <pod-name> -n <namespace>
# Check logs (previous crashed container)
kubectl logs <pod-name> -n <namespace> --previous
# For multi-container pods
kubectl logs <pod-name> -n <namespace> -c <container-name> --previousEroare: OOMKilled (Exit Code 137)
Simptom:
State: Terminated
Reason: OOMKilled
Exit Code: 137
Cauza: Containerul a depasit limita de memorie.
Solutia 1 - Creste limitele de memorie:
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi" # Increase thisSolutia 2 - Profileaza utilizarea reala a memoriei:
# Check current usage
kubectl top pod <pod-name> -n <namespace>
# View resource requests/limits
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'Solutia 3 - Remedii la nivel de aplicatie:
# For Node.js
NODE_OPTIONS="--max-old-space-size=384"
# For Java
JAVA_OPTS="-Xmx384m -Xms256m"Eroare: Verificare de sanatate esuata (Liveness/Readiness)
Simptom:
Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy Readiness probe failed: connection refused
Cauza 1 - Probe-ul porneste inainte ca aplicatia sa fie gata:
# Add startup probe for slow-starting apps
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10
# Adjust liveness probe
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60 # Wait for startup
periodSeconds: 10
failureThreshold: 3Cauza 2 - Port sau cale gresita:
# Verify your app actually listens on this port/path
livenessProbe:
httpGet:
path: /healthz # Must match your app's health endpoint
port: 8080 # Must match container port
scheme: HTTP # or HTTPSCauza 3 - Endpoint-ul de sanatate nu este implementat:
// Node.js Express example
app.get('/health', (req, res) => {
res.status(200).json({ status: 'healthy' });
});
app.get('/ready', (req, res) => {
// Check dependencies (DB, cache, etc.)
if (dbConnected && cacheConnected) {
res.status(200).json({ status: 'ready' });
} else {
res.status(503).json({ status: 'not ready' });
}
});Eroare: ImagePullBackOff / ErrImagePull
Simptom:
Warning Failed Failed to pull image "myregistry.com/app:latest"
Warning Failed Error: ErrImagePull
Normal BackOff Back-off pulling image
Cauza 1 - Imaginea nu exista:
# Verify image exists
docker pull myregistry.com/app:latest
# Check exact tag
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].image}'Cauza 2 - Autentificare la registru privat:
# Create docker registry secret
kubectl create secret docker-registry regcred \
--docker-server=myregistry.com \
--docker-username=user \
--docker-password=pass \
--docker-email=email@example.com \
-n <namespace># Reference in pod spec
spec:
imagePullSecrets:
- name: regcred
containers:
- name: app
image: myregistry.com/app:latestCauza 3 - Service account-ul nu are secretul de image pull:
# Add to service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-service-account
imagePullSecrets:
- name: regcredEroare: ConfigMap sau Secret lipsa
Simptom:
Warning Failed Error: configmap "app-config" not found
Warning Failed Error: secret "db-credentials" not found
Solutia 1 - Creeaza resursele lipsa:
# Check if ConfigMap exists
kubectl get configmap app-config -n <namespace>
# Create ConfigMap
kubectl create configmap app-config \
--from-file=config.yaml \
-n <namespace>
# Create Secret
kubectl create secret generic db-credentials \
--from-literal=username=admin \
--from-literal=password=secret \
-n <namespace>Solutia 2 - Fa-le optionale:
env:
- name: CONFIG_VALUE
valueFrom:
configMapKeyRef:
name: app-config
key: some-key
optional: true # Don't fail if missingEroare: Esecuri la montarea volumelor
Simptom:
Warning FailedMount Unable to attach or mount volumes
Warning FailedMount MountVolume.SetUp failed for volume "pvc-xxx"
Cauza 1 - PVC nu este legat:
# Check PVC status
kubectl get pvc -n <namespace>
# Should show "Bound", not "Pending"Cauza 2 - Probleme cu storage class:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard # Must exist in cluster
resources:
requests:
storage: 10GiCauza 3 - Sistem de fisiere read-only:
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp # App needs writable tmp
- name: cache
mountPath: /app/cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}Eroare: Probleme cu comanda/entrypoint-ul containerului
Simptom:
Error: failed to start container: OCI runtime create failed
Error: container_linux.go:380: starting container process caused: exec: "myapp": executable file not found
Solutia 1 - Verifica daca comanda exista:
# Debug by running shell
kubectl run debug --rm -it --image=myimage -- /bin/sh
# Then check if binary exists
which myapp
ls -la /app/Solutia 2 - Verifica suprascrierea entrypoint-ului:
# If Dockerfile has ENTRYPOINT, and you override command:
containers:
- name: app
image: myimage
command: ["/bin/sh"] # Overrides ENTRYPOINT
args: ["-c", "/app/start.sh"] # Overrides CMDSolutia 3 - Probleme cu scripturile shell:
# Common issue: Windows line endings
# Fix in Dockerfile:
RUN sed -i 's/\r$//' /app/start.sh && chmod +x /app/start.shEroare: Referinta coduri de iesire
| Cod de iesire | Semnificatie | Cauza frecventa | |---------------|-------------|-----------------| | 0 | Succes (dar nu ar trebui sa se opreasca) | Entrypoint lipsa, aplicatia se opreste imediat | | 1 | Eroare generala | Eroare de aplicatie, verifica logurile | | 126 | Permisiune refuzata | Scriptul nu este executabil | | 127 | Comanda nu a fost gasita | Binar lipsa, cale gresita | | 137 | SIGKILL (OOM) | Memorie insuficienta | | 139 | SIGSEGV | Segmentation fault | | 143 | SIGTERM | Terminare gratiosa |
Pattern-ul Debug Container
Cand nu poti accesa logurile:
# Add debug container to running pod
kubectl debug <pod-name> -it --image=busybox --target=<container-name>
# Or create debug pod with same config
apiVersion: v1
kind: Pod
metadata:
name: debug-pod
spec:
containers:
- name: debug
image: busybox
command: ["sleep", "infinity"]
volumeMounts:
# Same mounts as failing pod
- name: config
mountPath: /configComenzi rapide de depanare
# All-in-one diagnosis
kubectl get pod <pod> -o yaml | grep -A 20 "state:"
# Events for namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Resource usage
kubectl top pods -n <namespace>
# Execute into running container
kubectl exec -it <pod> -n <namespace> -- /bin/sh
# Port forward to test locally
kubectl port-forward <pod> 8080:8080 -n <namespace>Checklist de preventie
Inainte de deployment:
- Testeaza containerul local cu aceleasi limite de resurse
- Implementeaza endpoint-uri de sanatate corespunzatoare
- Adauga startup probes pentru aplicatii care pornesc lent
- Foloseste init containers pentru dependinte
- Seteaza resource requests SI limits
- Testeaza cu permisiuni minime (non-root)
- Verifica ca toate ConfigMap-urile/Secretele exista
- Verifica disponibilitatea storage class
Probleme complexe Kubernetes?
Depanarea problemelor Kubernetes de productie necesita expertiza profunda. Echipa noastra ofera:
- Raspuns la incidente Kubernetes 24/7
- Evaluari ale sanatatii clusterului
- Consultanta pentru optimizarea resurselor
- Securizare si conformitate
Sistemul tau AI e conform cu EU AI Act? Evaluare gratuita de risc - afla in 2 minute →