ML CI/CD: Integrare Continua si Deployment pentru Machine Learning
CI/CD pentru machine learning este fundamental diferit de CI/CD-ul traditional din software. In software, testezi cod. In ML, testezi cod si date si calitatea modelului si infrastructura de serving. Acest ghid acopera cum sa construiesti pipeline-uri ML CI/CD suficient de fiabile pentru productie.
De ce ML CI/CD este diferit
| Aspect | Software CI/CD | ML CI/CD | |--------|---------------|----------| | Ce se schimba | Cod | Cod + Date + Model + Configuratie | | Tipuri de teste | Unit, integrare, e2e | Calitate date, calitate model, integrare, performanta | | Artefact de build | Container/binar | Artefact model + configuratie serving | | Trigger deployment | Push de cod | Push de cod SAU refresh de date SAU degradare performanta | | Rollback | Versiunea anterioara de cod | Versiunea anterioara de model (poate necesita features diferite) | | Mediu | Compute standard | Clustere GPU pentru antrenare, CPU/GPU pentru serving |
Arhitectura Pipeline ML CI/CD
Code Push / Data Refresh / Schedule
│
▼
┌───────────────┐
│ Data Validation│◄─── Schema checks, teste statistice, freshness
└───────┬───────┘
│ Pass
▼
┌───────────────┐
│ Feature Compute│◄─── Feature engineering, transformare
└───────┬───────┘
│
▼
┌───────────────┐
│ Training │◄─── Configuratie hiperparametri, alocare compute
└───────┬───────┘
│
▼
┌───────────────┐
│ Model Testing │◄─── Quality gates, verificari regresie, teste bias
└───────┬───────┘
│ Pass
▼
┌───────────────┐
│ Registry │◄─── Versionare, tagging, stocare in model registry
└───────┬───────┘
│
▼
┌───────────────┐
│ Deploy │◄─── Shadow → Canary → Production
└───────┬───────┘
│
▼
┌───────────────┐
│ Monitor │◄─── Drift, performanta, latenta
└───────────────┘
GitHub Actions pentru ML CI/CD
Pipeline de Validare a Datelor
name: Data Validation
on:
schedule:
- cron: '0 1 * * *' # Daily at 1 AM UTC
workflow_dispatch:
inputs:
data_version:
description: 'Data version to validate'
required: false
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install dependencies
run: pip install -r requirements/validation.txt
- name: Pull latest data
run: dvc pull data/processed/
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- name: Run schema validation
run: python pipelines/validate_schema.py
- name: Run statistical tests
run: python pipelines/validate_statistics.py
- name: Run data quality checks
run: python pipelines/validate_quality.py
- name: Generate data profile
run: python pipelines/generate_profile.py --output reports/data_profile.html
- name: Upload validation report
uses: actions/upload-artifact@v4
with:
name: data-validation-report
path: reports/Pipeline de Antrenare si Testare Model
name: ML Training Pipeline
on:
push:
paths:
- 'src/models/**'
- 'src/features/**'
- 'configs/training/**'
workflow_dispatch:
inputs:
experiment_name:
description: 'MLflow experiment name'
required: true
default: 'production-training'
jobs:
train:
runs-on: [self-hosted, gpu]
timeout-minutes: 120
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: pip install -r requirements/training.txt
- name: Pull training data
run: dvc pull data/
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- name: Train model
run: |
python src/train.py \
--config configs/training/production.yaml \
--experiment ${{ inputs.experiment_name || 'production-training' }}
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
- name: Run model tests
run: pytest tests/model/ -v --tb=short
- name: Run bias and fairness tests
run: python tests/fairness/check_bias.py
- name: Upload model artifact
uses: actions/upload-artifact@v4
with:
name: trained-model
path: artifacts/model/
quality-gate:
needs: train
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Download model artifact
uses: actions/download-artifact@v4
with:
name: trained-model
path: artifacts/model/
- name: Compare with production model
run: |
python pipelines/compare_models.py \
--new-model artifacts/model/ \
--production-model models:/churn-predictor/Production \
--threshold 0.02
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
- name: Register model if improved
if: success()
run: |
python pipelines/register_model.py \
--model-path artifacts/model/ \
--name churn-predictor \
--stage staging
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
deploy:
needs: quality-gate
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Deploy canary (10% traffic)
run: |
python pipelines/deploy.py \
--model-name churn-predictor \
--stage staging \
--strategy canary \
--traffic-split 10
env:
K8S_CLUSTER: ${{ secrets.K8S_CLUSTER }}
- name: Wait and monitor canary
run: python pipelines/monitor_canary.py --duration 30m --model churn-predictor
- name: Promote to production
run: |
python pipelines/deploy.py \
--model-name churn-predictor \
--strategy promoteFramework de Testare a Modelelor
Suita de Teste Multi-nivel
"""
ML Model Test Suite, ruleaza in CI/CD dupa fiecare antrenare.
"""
import pytest
import numpy as np
import joblib
from pathlib import Path
MODEL_PATH = Path("artifacts/model/model.joblib")
TEST_DATA_PATH = Path("data/test/test.parquet")
@pytest.fixture(scope="session")
def model():
return joblib.load(MODEL_PATH)
@pytest.fixture(scope="session")
def test_data():
import pandas as pd
return pd.read_parquet(TEST_DATA_PATH)
class TestModelAccuracy:
"""Quality gates: modelul trebuie sa atinga praguri minime de performanta."""
def test_accuracy_above_threshold(self, model, test_data):
from sklearn.metrics import accuracy_score
X = test_data.drop("target", axis=1)
y = test_data["target"]
y_pred = model.predict(X)
accuracy = accuracy_score(y, y_pred)
assert accuracy >= 0.85, f"Accuracy {accuracy:.3f} below 0.85 threshold"
def test_no_regression_vs_baseline(self, model, test_data):
"""Modelul nou nu trebuie sa fie mai slab decat baseline-ul documentat."""
from sklearn.metrics import f1_score
X = test_data.drop("target", axis=1)
y = test_data["target"]
y_pred = model.predict(X)
f1 = f1_score(y, y_pred, average="weighted")
BASELINE_F1 = 0.82 # Documented baseline from last stable release
assert f1 >= BASELINE_F1 - 0.02, f"F1 {f1:.3f} regressed vs baseline {BASELINE_F1}"
class TestModelRobustness:
"""Verifica daca modelul gestioneaza corect cazurile limita."""
def test_handles_missing_values(self, model):
"""Modelul nu ar trebui sa se blocheze pe input-uri NaN."""
sample = np.full((1, model.n_features_in_), np.nan)
try:
model.predict(sample)
except ValueError:
pass # Expected for models that don't handle NaN
# Should not raise unexpected exceptions
def test_prediction_determinism(self, model, test_data):
"""Acelasi input trebuie sa produca acelasi output."""
X = test_data.drop("target", axis=1).head(10)
pred1 = model.predict(X)
pred2 = model.predict(X)
np.testing.assert_array_equal(pred1, pred2)
def test_prediction_latency(self, model, test_data):
"""O singura predictie trebuie sa fie suficient de rapida pentru SLA-ul de serving."""
import time
X_single = test_data.drop("target", axis=1).head(1)
times = []
for _ in range(100):
start = time.perf_counter()
model.predict(X_single)
times.append((time.perf_counter() - start) * 1000)
p99 = np.percentile(times, 99)
assert p99 < 50, f"P99 latency {p99:.1f}ms exceeds 50ms SLA"
class TestModelFairness:
"""Verifica bias-ul intre grupuri protejate."""
def test_equal_opportunity(self, model, test_data):
"""Rata de true positive trebuie sa fie similara intre grupuri."""
from sklearn.metrics import recall_score
X = test_data.drop("target", axis=1)
y = test_data["target"]
y_pred = model.predict(X)
if "demographic_group" not in test_data.columns:
pytest.skip("No demographic column available")
groups = test_data["demographic_group"].unique()
tpr_by_group = {}
for group in groups:
mask = test_data["demographic_group"] == group
if mask.sum() < 50:
continue
tpr = recall_score(y[mask], y_pred[mask], zero_division=0)
tpr_by_group[group] = tpr
if len(tpr_by_group) < 2:
pytest.skip("Not enough groups for comparison")
max_tpr = max(tpr_by_group.values())
min_tpr = min(tpr_by_group.values())
disparity = max_tpr - min_tpr
assert disparity < 0.15, f"TPR disparity {disparity:.3f} exceeds 0.15 threshold: {tpr_by_group}"Data Version Control in CI/CD
# dvc.yaml, defineste etape de pipeline reproductibile
stages:
preprocess:
cmd: python src/preprocess.py --config configs/preprocess.yaml
deps:
- src/preprocess.py
- data/raw/
- configs/preprocess.yaml
outs:
- data/processed/
train:
cmd: python src/train.py --config configs/training/production.yaml
deps:
- src/train.py
- data/processed/
- configs/training/production.yaml
outs:
- artifacts/model/
metrics:
- metrics/training.json:
cache: false
params:
- configs/training/production.yaml:
- model.n_estimators
- model.max_depth
- model.learning_rate# In CI: reproduce pipeline-ul si verifica modificarile
dvc repro
dvc metrics diff # Compara metricile cu rularea anterioara
dvc plots diff # Genereaza comparatie vizualaStrategii de Deployment pentru Modele ML
Blue-Green Deployment
def blue_green_deploy(model_name: str, new_version: str):
"""Comuta traficul atomic intre versiuni de model."""
# Deploy versiunea noua pe endpoint-ul "green"
deploy_to_endpoint(model_name, new_version, endpoint="green")
# Ruleaza smoke tests pe green
if smoke_test(endpoint="green"):
# Comuta traficul de la blue la green
switch_traffic(model_name, from_endpoint="blue", to_endpoint="green")
# Pastreaza blue ca rollback
else:
# Dezactiveaza deployment-ul green esuat
teardown_endpoint("green")
raise DeploymentError(f"Smoke tests failed for {model_name}:{new_version}")Rollout Progresiv
ROLLOUT_STAGES = [
{"traffic_pct": 5, "duration_minutes": 15, "error_threshold": 0.02},
{"traffic_pct": 25, "duration_minutes": 30, "error_threshold": 0.015},
{"traffic_pct": 50, "duration_minutes": 60, "error_threshold": 0.01},
{"traffic_pct": 100, "duration_minutes": 0, "error_threshold": 0.01},
]
async def progressive_rollout(model_name: str, new_version: str):
for stage in ROLLOUT_STAGES:
set_traffic_split(model_name, new_version, stage["traffic_pct"])
if stage["duration_minutes"] > 0:
metrics = await monitor_for(minutes=stage["duration_minutes"])
if metrics["error_rate"] > stage["error_threshold"]:
rollback(model_name)
raise RolloutError(f"Error rate {metrics['error_rate']:.3f} exceeded threshold at {stage['traffic_pct']}%")
mark_as_production(model_name, new_version)Concluzii
- Testeaza datele la fel de riguros ca si codul: Validare de schema, verificari statistice si monitorizare freshness
- Quality gates inainte de deployment: Modelele trebuie sa bata baseline-urile, nu doar sa treaca teste unitare
- Deployment progresiv: Nu trece niciodata de la 0% la 100% trafic instantaneu
- Versioneaza totul: Cod, date, model si configuratie trebuie sa fie reproductibile
- Automatizeaza reantrenarea: Programata sau bazata pe triggere, cu aprobare umana pentru promovare in productie
Resurse Conexe
- Cele mai bune practici MLOps pentru ghidul complet de productie
- Tutorial MLflow pentru experiment tracking in CI/CD-ul tau
- Monitorizare modele pentru a inchide bucla dupa deployment
- Securitate CI/CD cu GitHub Actions pentru securizarea pipeline-urilor ML
Construiesti ML CI/CD? DeviDevs implementeaza pipeline-uri MLOps end-to-end cu testare automatizata, deployment progresiv si monitorizare. Obtine o evaluare gratuita ->
Sistemul tau AI e conform cu EU AI Act? Evaluare gratuita de risc - afla in 2 minute →