MLOps

ML CI/CD: integrare si deployment pentru ML

Petru Constantin
--8 min lectura
#ML CI/CD#MLOps#model deployment#continuous integration#GitHub Actions#model testing

ML CI/CD: Integrare Continua si Deployment pentru Machine Learning

CI/CD pentru machine learning este fundamental diferit de CI/CD-ul traditional din software. In software, testezi cod. In ML, testezi cod si date si calitatea modelului si infrastructura de serving. Acest ghid acopera cum sa construiesti pipeline-uri ML CI/CD suficient de fiabile pentru productie.

De ce ML CI/CD este diferit

| Aspect | Software CI/CD | ML CI/CD | |--------|---------------|----------| | Ce se schimba | Cod | Cod + Date + Model + Configuratie | | Tipuri de teste | Unit, integrare, e2e | Calitate date, calitate model, integrare, performanta | | Artefact de build | Container/binar | Artefact model + configuratie serving | | Trigger deployment | Push de cod | Push de cod SAU refresh de date SAU degradare performanta | | Rollback | Versiunea anterioara de cod | Versiunea anterioara de model (poate necesita features diferite) | | Mediu | Compute standard | Clustere GPU pentru antrenare, CPU/GPU pentru serving |

Arhitectura Pipeline ML CI/CD

Code Push / Data Refresh / Schedule
            │
            ▼
    ┌───────────────┐
    │ Data Validation│◄─── Schema checks, teste statistice, freshness
    └───────┬───────┘
            │ Pass
            ▼
    ┌───────────────┐
    │ Feature Compute│◄─── Feature engineering, transformare
    └───────┬───────┘
            │
            ▼
    ┌───────────────┐
    │    Training    │◄─── Configuratie hiperparametri, alocare compute
    └───────┬───────┘
            │
            ▼
    ┌───────────────┐
    │ Model Testing  │◄─── Quality gates, verificari regresie, teste bias
    └───────┬───────┘
            │ Pass
            ▼
    ┌───────────────┐
    │   Registry     │◄─── Versionare, tagging, stocare in model registry
    └───────┬───────┘
            │
            ▼
    ┌───────────────┐
    │   Deploy       │◄─── Shadow → Canary → Production
    └───────┬───────┘
            │
            ▼
    ┌───────────────┐
    │   Monitor      │◄─── Drift, performanta, latenta
    └───────────────┘

GitHub Actions pentru ML CI/CD

Pipeline de Validare a Datelor

name: Data Validation
on:
  schedule:
    - cron: '0 1 * * *'  # Daily at 1 AM UTC
  workflow_dispatch:
    inputs:
      data_version:
        description: 'Data version to validate'
        required: false
 
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'
 
      - name: Install dependencies
        run: pip install -r requirements/validation.txt
 
      - name: Pull latest data
        run: dvc pull data/processed/
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
 
      - name: Run schema validation
        run: python pipelines/validate_schema.py
 
      - name: Run statistical tests
        run: python pipelines/validate_statistics.py
 
      - name: Run data quality checks
        run: python pipelines/validate_quality.py
 
      - name: Generate data profile
        run: python pipelines/generate_profile.py --output reports/data_profile.html
 
      - name: Upload validation report
        uses: actions/upload-artifact@v4
        with:
          name: data-validation-report
          path: reports/

Pipeline de Antrenare si Testare Model

name: ML Training Pipeline
on:
  push:
    paths:
      - 'src/models/**'
      - 'src/features/**'
      - 'configs/training/**'
  workflow_dispatch:
    inputs:
      experiment_name:
        description: 'MLflow experiment name'
        required: true
        default: 'production-training'
 
jobs:
  train:
    runs-on: [self-hosted, gpu]
    timeout-minutes: 120
    steps:
      - uses: actions/checkout@v4
 
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
 
      - name: Install dependencies
        run: pip install -r requirements/training.txt
 
      - name: Pull training data
        run: dvc pull data/
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
 
      - name: Train model
        run: |
          python src/train.py \
            --config configs/training/production.yaml \
            --experiment ${{ inputs.experiment_name || 'production-training' }}
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
 
      - name: Run model tests
        run: pytest tests/model/ -v --tb=short
 
      - name: Run bias and fairness tests
        run: python tests/fairness/check_bias.py
 
      - name: Upload model artifact
        uses: actions/upload-artifact@v4
        with:
          name: trained-model
          path: artifacts/model/
 
  quality-gate:
    needs: train
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Download model artifact
        uses: actions/download-artifact@v4
        with:
          name: trained-model
          path: artifacts/model/
 
      - name: Compare with production model
        run: |
          python pipelines/compare_models.py \
            --new-model artifacts/model/ \
            --production-model models:/churn-predictor/Production \
            --threshold 0.02
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
 
      - name: Register model if improved
        if: success()
        run: |
          python pipelines/register_model.py \
            --model-path artifacts/model/ \
            --name churn-predictor \
            --stage staging
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
 
  deploy:
    needs: quality-gate
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
 
      - name: Deploy canary (10% traffic)
        run: |
          python pipelines/deploy.py \
            --model-name churn-predictor \
            --stage staging \
            --strategy canary \
            --traffic-split 10
        env:
          K8S_CLUSTER: ${{ secrets.K8S_CLUSTER }}
 
      - name: Wait and monitor canary
        run: python pipelines/monitor_canary.py --duration 30m --model churn-predictor
 
      - name: Promote to production
        run: |
          python pipelines/deploy.py \
            --model-name churn-predictor \
            --strategy promote

Framework de Testare a Modelelor

Suita de Teste Multi-nivel

"""
ML Model Test Suite, ruleaza in CI/CD dupa fiecare antrenare.
"""
import pytest
import numpy as np
import joblib
from pathlib import Path
 
MODEL_PATH = Path("artifacts/model/model.joblib")
TEST_DATA_PATH = Path("data/test/test.parquet")
 
@pytest.fixture(scope="session")
def model():
    return joblib.load(MODEL_PATH)
 
@pytest.fixture(scope="session")
def test_data():
    import pandas as pd
    return pd.read_parquet(TEST_DATA_PATH)
 
 
class TestModelAccuracy:
    """Quality gates: modelul trebuie sa atinga praguri minime de performanta."""
 
    def test_accuracy_above_threshold(self, model, test_data):
        from sklearn.metrics import accuracy_score
        X = test_data.drop("target", axis=1)
        y = test_data["target"]
        y_pred = model.predict(X)
        accuracy = accuracy_score(y, y_pred)
        assert accuracy >= 0.85, f"Accuracy {accuracy:.3f} below 0.85 threshold"
 
    def test_no_regression_vs_baseline(self, model, test_data):
        """Modelul nou nu trebuie sa fie mai slab decat baseline-ul documentat."""
        from sklearn.metrics import f1_score
        X = test_data.drop("target", axis=1)
        y = test_data["target"]
        y_pred = model.predict(X)
        f1 = f1_score(y, y_pred, average="weighted")
        BASELINE_F1 = 0.82  # Documented baseline from last stable release
        assert f1 >= BASELINE_F1 - 0.02, f"F1 {f1:.3f} regressed vs baseline {BASELINE_F1}"
 
 
class TestModelRobustness:
    """Verifica daca modelul gestioneaza corect cazurile limita."""
 
    def test_handles_missing_values(self, model):
        """Modelul nu ar trebui sa se blocheze pe input-uri NaN."""
        sample = np.full((1, model.n_features_in_), np.nan)
        try:
            model.predict(sample)
        except ValueError:
            pass  # Expected for models that don't handle NaN
        # Should not raise unexpected exceptions
 
    def test_prediction_determinism(self, model, test_data):
        """Acelasi input trebuie sa produca acelasi output."""
        X = test_data.drop("target", axis=1).head(10)
        pred1 = model.predict(X)
        pred2 = model.predict(X)
        np.testing.assert_array_equal(pred1, pred2)
 
    def test_prediction_latency(self, model, test_data):
        """O singura predictie trebuie sa fie suficient de rapida pentru SLA-ul de serving."""
        import time
        X_single = test_data.drop("target", axis=1).head(1)
        times = []
        for _ in range(100):
            start = time.perf_counter()
            model.predict(X_single)
            times.append((time.perf_counter() - start) * 1000)
        p99 = np.percentile(times, 99)
        assert p99 < 50, f"P99 latency {p99:.1f}ms exceeds 50ms SLA"
 
 
class TestModelFairness:
    """Verifica bias-ul intre grupuri protejate."""
 
    def test_equal_opportunity(self, model, test_data):
        """Rata de true positive trebuie sa fie similara intre grupuri."""
        from sklearn.metrics import recall_score
        X = test_data.drop("target", axis=1)
        y = test_data["target"]
        y_pred = model.predict(X)
 
        if "demographic_group" not in test_data.columns:
            pytest.skip("No demographic column available")
 
        groups = test_data["demographic_group"].unique()
        tpr_by_group = {}
        for group in groups:
            mask = test_data["demographic_group"] == group
            if mask.sum() < 50:
                continue
            tpr = recall_score(y[mask], y_pred[mask], zero_division=0)
            tpr_by_group[group] = tpr
 
        if len(tpr_by_group) < 2:
            pytest.skip("Not enough groups for comparison")
 
        max_tpr = max(tpr_by_group.values())
        min_tpr = min(tpr_by_group.values())
        disparity = max_tpr - min_tpr
        assert disparity < 0.15, f"TPR disparity {disparity:.3f} exceeds 0.15 threshold: {tpr_by_group}"

Data Version Control in CI/CD

# dvc.yaml, defineste etape de pipeline reproductibile
stages:
  preprocess:
    cmd: python src/preprocess.py --config configs/preprocess.yaml
    deps:
      - src/preprocess.py
      - data/raw/
      - configs/preprocess.yaml
    outs:
      - data/processed/
 
  train:
    cmd: python src/train.py --config configs/training/production.yaml
    deps:
      - src/train.py
      - data/processed/
      - configs/training/production.yaml
    outs:
      - artifacts/model/
    metrics:
      - metrics/training.json:
          cache: false
    params:
      - configs/training/production.yaml:
          - model.n_estimators
          - model.max_depth
          - model.learning_rate
# In CI: reproduce pipeline-ul si verifica modificarile
dvc repro
dvc metrics diff  # Compara metricile cu rularea anterioara
dvc plots diff    # Genereaza comparatie vizuala

Strategii de Deployment pentru Modele ML

Blue-Green Deployment

def blue_green_deploy(model_name: str, new_version: str):
    """Comuta traficul atomic intre versiuni de model."""
    # Deploy versiunea noua pe endpoint-ul "green"
    deploy_to_endpoint(model_name, new_version, endpoint="green")
 
    # Ruleaza smoke tests pe green
    if smoke_test(endpoint="green"):
        # Comuta traficul de la blue la green
        switch_traffic(model_name, from_endpoint="blue", to_endpoint="green")
        # Pastreaza blue ca rollback
    else:
        # Dezactiveaza deployment-ul green esuat
        teardown_endpoint("green")
        raise DeploymentError(f"Smoke tests failed for {model_name}:{new_version}")

Rollout Progresiv

ROLLOUT_STAGES = [
    {"traffic_pct": 5, "duration_minutes": 15, "error_threshold": 0.02},
    {"traffic_pct": 25, "duration_minutes": 30, "error_threshold": 0.015},
    {"traffic_pct": 50, "duration_minutes": 60, "error_threshold": 0.01},
    {"traffic_pct": 100, "duration_minutes": 0, "error_threshold": 0.01},
]
 
async def progressive_rollout(model_name: str, new_version: str):
    for stage in ROLLOUT_STAGES:
        set_traffic_split(model_name, new_version, stage["traffic_pct"])
 
        if stage["duration_minutes"] > 0:
            metrics = await monitor_for(minutes=stage["duration_minutes"])
            if metrics["error_rate"] > stage["error_threshold"]:
                rollback(model_name)
                raise RolloutError(f"Error rate {metrics['error_rate']:.3f} exceeded threshold at {stage['traffic_pct']}%")
 
    mark_as_production(model_name, new_version)

Concluzii

  1. Testeaza datele la fel de riguros ca si codul: Validare de schema, verificari statistice si monitorizare freshness
  2. Quality gates inainte de deployment: Modelele trebuie sa bata baseline-urile, nu doar sa treaca teste unitare
  3. Deployment progresiv: Nu trece niciodata de la 0% la 100% trafic instantaneu
  4. Versioneaza totul: Cod, date, model si configuratie trebuie sa fie reproductibile
  5. Automatizeaza reantrenarea: Programata sau bazata pe triggere, cu aprobare umana pentru promovare in productie

Resurse Conexe


Construiesti ML CI/CD? DeviDevs implementeaza pipeline-uri MLOps end-to-end cu testare automatizata, deployment progresiv si monitorizare. Obtine o evaluare gratuita ->


Sistemul tau AI e conform cu EU AI Act? Evaluare gratuita de risc - afla in 2 minute →

Ai nevoie de ajutor cu conformitatea EU AI Act sau securitatea AI?

Programeaza o consultatie gratuita de 30 de minute. Fara obligatii.

Programeaza un Apel

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.