MLOps

Guvernanta modelelor: management ciclu de viata ML

Petru Constantin
--7 min lectura
#model governance#MLOps#ML lifecycle#model cards#audit trail#compliance

Model Governance: Gestionarea modelelor ML de la dezvoltare pana la retragere

Model governance e setul de politici, procese si instrumente care asigura ca modelele ML sunt dezvoltate, deployate si mentinute responsabil. Pe masura ce organizatiile scaleaza de la o mana de modele la sute, governance previne haosul modelelor neurmarite care iau decizii critice fara supraveghere.

De ce conteaza model governance

Fara governance, organizatiile se confrunta cu:

  • Modele fantoma: Modele deployate fara aprobare sau documentatie
  • Risc reglementar: Non-conformitate cu EU AI Act, GDPR sau reglementari sectoriale
  • Risc operational: Nimeni nu stie ce versiune de model serveste ce endpoint
  • Esec de reproductibilitate: Nu poti recrea comportamentul unui model pentru audit sau debugging
  • Modele invechite: Modele care ruleaza in productie mult dupa ce si-au depasit utilitatea

Framework-ul de Model Governance

┌─────────────────────────────────────────────────────────────┐
│                   Model Governance Framework                  │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  Development          Review            Production            │
│  ┌──────────┐     ┌──────────┐      ┌──────────────┐        │
│  │ Experiment │────▶│ Model    │─────▶│ Deployment    │        │
│  │ Tracking   │     │ Review   │      │ + Monitoring  │        │
│  └──────────┘     └──────────┘      └──────┬───────┘        │
│       │                │                     │                │
│       ▼                ▼                     ▼                │
│  ┌──────────┐     ┌──────────┐      ┌──────────────┐        │
│  │ Model     │     │ Approval  │      │ Retirement    │        │
│  │ Card      │     │ Workflow  │      │ Planning      │        │
│  └──────────┘     └──────────┘      └──────────────┘        │
│                                                               │
│  Cross-cutting: Audit Trail • Risk Assessment • Documentation │
└─────────────────────────────────────────────────────────────┘

Model Cards

Un model card e un document standardizat care insoteste fiecare model, oferind transparenta despre scopul, performanta, limitarile si consideratiile etice ale acestuia.

from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
 
class RiskLevel(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"
 
@dataclass
class ModelCard:
    """Standardized model documentation following Google's Model Cards framework."""
    # Identity
    model_name: str
    model_version: str
    model_id: str
    owner: str
    team: str
 
    # Purpose
    intended_use: str
    out_of_scope_uses: list[str]
    primary_users: list[str]
 
    # Training
    training_data_description: str
    training_data_version: str
    training_date: datetime
    training_pipeline: str
    hyperparameters: dict
 
    # Performance
    evaluation_metrics: dict[str, float]
    performance_by_group: dict[str, dict[str, float]] = field(default_factory=dict)
    known_limitations: list[str] = field(default_factory=list)
 
    # Risk & Ethics
    risk_level: RiskLevel = RiskLevel.MEDIUM
    ethical_considerations: list[str] = field(default_factory=list)
    fairness_assessment: dict = field(default_factory=dict)
 
    # Governance
    approved_by: str | None = None
    approval_date: datetime | None = None
    review_schedule: str = "quarterly"
    retirement_criteria: list[str] = field(default_factory=list)
 
    def to_markdown(self) -> str:
        """Generate human-readable model card."""
        sections = [
            f"# Model Card: {self.model_name} v{self.model_version}",
            f"\n## Overview",
            f"- **Owner**: {self.owner} ({self.team})",
            f"- **Risk Level**: {self.risk_level.value.upper()}",
            f"- **Approved By**: {self.approved_by or 'PENDING'}",
            f"- **Review Schedule**: {self.review_schedule}",
            f"\n## Intended Use",
            self.intended_use,
            f"\n### Out of Scope",
            *[f"- {use}" for use in self.out_of_scope_uses],
            f"\n## Performance Metrics",
            *[f"- **{k}**: {v:.4f}" for k, v in self.evaluation_metrics.items()],
            f"\n## Known Limitations",
            *[f"- {lim}" for lim in self.known_limitations],
            f"\n## Training Details",
            f"- **Data Version**: {self.training_data_version}",
            f"- **Trained**: {self.training_date.strftime('%Y-%m-%d')}",
            f"- **Pipeline**: {self.training_pipeline}",
        ]
        return "\n".join(sections)
 
# Exemplu de utilizare
card = ModelCard(
    model_name="customer-churn-predictor",
    model_version="2.4.1",
    model_id="churn-prod-v241",
    owner="ML Team",
    team="ml-platform",
    intended_use="Predict customer churn probability for proactive retention campaigns",
    out_of_scope_uses=[
        "Credit scoring or lending decisions",
        "Individual customer discrimination",
        "Automated account termination without human review",
    ],
    primary_users=["Customer Success Team", "Marketing Automation"],
    training_data_description="12 months of customer behavior data (purchases, support tickets, page views)",
    training_data_version="dvc:data/v2.4",
    training_date=datetime(2026, 2, 1),
    training_pipeline="kubeflow:nightly-churn-v2",
    hyperparameters={"n_estimators": 300, "max_depth": 12, "learning_rate": 0.05},
    evaluation_metrics={"accuracy": 0.931, "f1_weighted": 0.905, "auc_roc": 0.972},
    risk_level=RiskLevel.MEDIUM,
    known_limitations=[
        "Lower accuracy for customers with < 30 days history",
        "Not validated for B2B enterprise customers",
        "Performance may degrade during holiday seasons",
    ],
    retirement_criteria=[
        "Accuracy drops below 0.88 for 7 consecutive days",
        "Training data becomes older than 90 days",
        "Replacement model approved and deployed",
    ],
)

Fluxuri de aprobare

from enum import Enum
from datetime import datetime
 
class ApprovalStatus(Enum):
    PENDING = "pending"
    IN_REVIEW = "in_review"
    APPROVED = "approved"
    REJECTED = "rejected"
    REVOKED = "revoked"
 
class ModelApprovalWorkflow:
    """Multi-stage approval workflow for model promotion."""
 
    REQUIRED_APPROVALS = {
        "staging": ["ml_engineer"],
        "production": ["ml_lead", "risk_officer"],
        "high_risk_production": ["ml_lead", "risk_officer", "legal"],
    }
 
    def __init__(self, model_name: str, model_version: str, risk_level: RiskLevel):
        self.model_name = model_name
        self.model_version = model_version
        self.risk_level = risk_level
        self.approvals: list[dict] = []
        self.status = ApprovalStatus.PENDING
 
    def get_required_roles(self, target_stage: str) -> list[str]:
        if self.risk_level in (RiskLevel.HIGH, RiskLevel.CRITICAL) and target_stage == "production":
            return self.REQUIRED_APPROVALS["high_risk_production"]
        return self.REQUIRED_APPROVALS.get(target_stage, ["ml_engineer"])
 
    def submit_approval(self, approver: str, role: str, decision: str, notes: str = ""):
        self.approvals.append({
            "approver": approver,
            "role": role,
            "decision": decision,
            "notes": notes,
            "timestamp": datetime.utcnow().isoformat(),
        })
 
    def check_approval_status(self, target_stage: str) -> tuple[ApprovalStatus, str]:
        required = self.get_required_roles(target_stage)
        approved_roles = {a["role"] for a in self.approvals if a["decision"] == "approved"}
        rejected = any(a["decision"] == "rejected" for a in self.approvals)
 
        if rejected:
            return ApprovalStatus.REJECTED, "Model was rejected by a reviewer"
 
        missing = set(required) - approved_roles
        if not missing:
            return ApprovalStatus.APPROVED, "All required approvals received"
 
        return ApprovalStatus.IN_REVIEW, f"Awaiting approval from: {', '.join(missing)}"

Audit Trail

Fiecare actiune asupra modelului trebuie logata pentru conformitate reglementara si debugging:

import json
from datetime import datetime
 
class ModelAuditLog:
    """Immutable audit trail for all model lifecycle events."""
 
    def __init__(self, storage_backend):
        self.storage = storage_backend
 
    def log_event(self, event_type: str, model_name: str, model_version: str,
                  actor: str, details: dict):
        event = {
            "event_id": self._generate_id(),
            "timestamp": datetime.utcnow().isoformat(),
            "event_type": event_type,
            "model_name": model_name,
            "model_version": model_version,
            "actor": actor,
            "details": details,
        }
        self.storage.append(event)
        return event
 
    def log_training(self, model_name: str, version: str, actor: str, **kwargs):
        return self.log_event("model.trained", model_name, version, actor, {
            "data_version": kwargs.get("data_version"),
            "metrics": kwargs.get("metrics"),
            "pipeline": kwargs.get("pipeline"),
        })
 
    def log_approval(self, model_name: str, version: str, approver: str, decision: str, notes: str = ""):
        return self.log_event("model.approval", model_name, version, approver, {
            "decision": decision,
            "notes": notes,
        })
 
    def log_deployment(self, model_name: str, version: str, actor: str, environment: str):
        return self.log_event("model.deployed", model_name, version, actor, {
            "environment": environment,
        })
 
    def log_retirement(self, model_name: str, version: str, actor: str, reason: str):
        return self.log_event("model.retired", model_name, version, actor, {
            "reason": reason,
        })
 
    def get_model_history(self, model_name: str) -> list[dict]:
        return self.storage.query(model_name=model_name)
 
    def _generate_id(self) -> str:
        import uuid
        return str(uuid.uuid4())

Managementul ciclului de viata al modelelor

class ModelLifecycleManager:
    """Manage models through their complete lifecycle."""
 
    LIFECYCLE_STAGES = ["development", "staging", "production", "deprecated", "retired"]
 
    def __init__(self, registry, monitor, audit_log):
        self.registry = registry
        self.monitor = monitor
        self.audit = audit_log
 
    def check_retirement_criteria(self, model_name: str) -> list[str]:
        """Check if a production model should be retired."""
        triggers = []
        model = self.registry.get_production_model(model_name)
 
        # Check age
        days_since_training = (datetime.utcnow() - model.training_date).days
        if days_since_training > 90:
            triggers.append(f"Model is {days_since_training} days old (limit: 90)")
 
        # Check performance
        current_metrics = self.monitor.get_current_metrics(model_name)
        if current_metrics.get("accuracy", 1.0) < model.retirement_threshold:
            triggers.append(
                f"Accuracy {current_metrics['accuracy']:.3f} below threshold {model.retirement_threshold}"
            )
 
        # Check drift
        drift_score = self.monitor.get_drift_score(model_name)
        if drift_score > 0.3:
            triggers.append(f"High data drift detected (score: {drift_score:.3f})")
 
        return triggers
 
    def retire_model(self, model_name: str, version: str, reason: str, actor: str):
        """Gracefully retire a model version."""
        # Move to deprecated first (allows rollback)
        self.registry.transition_stage(model_name, version, "deprecated")
        self.audit.log_event("model.deprecated", model_name, version, actor, {"reason": reason})
 
        # After grace period, fully retire
        # This is typically scheduled, not immediate
        return {"status": "deprecated", "reason": reason, "full_retirement": "scheduled_30d"}

Alinierea cu EU AI Act

Model governance sustine direct conformitatea cu EU AI Act:

| Cerinta EU AI Act | Implementare Governance | |----------------------|--------------------------| | Clasificarea riscurilor (Art. 6) | Niveluri de risc in model cards | | Documentatie tehnica (Art. 11) | Model cards + audit trail | | Pastrarea inregistrarilor (Art. 12) | Audit log imuabil | | Transparenta (Art. 13) | Model cards cu scop de utilizare | | Supraveghere umana (Art. 14) | Fluxuri de aprobare | | Acuratete si robustete (Art. 15) | Monitorizarea performantei + criterii de retragere |

Resurse conexe


Ai nevoie de model governance pentru sistemele tale ML? DeviDevs implementeaza framework-uri de governance aliniate cu cerintele EU AI Act. Obtine o evaluare gratuita ->


Sistemul tau AI e conform cu EU AI Act? Evaluare gratuita de risc - afla in 2 minute →

Ai nevoie de ajutor cu conformitatea EU AI Act sau securitatea AI?

Programeaza o consultatie gratuita de 30 de minute. Fara obligatii.

Programeaza un Apel

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.