MLOps

Model Governance: Managing ML Models from Development to Retirement

DeviDevs Team
7 min read
#model governance#MLOps#ML lifecycle#model cards#audit trail#compliance

Model Governance: Managing ML Models from Development to Retirement

Model governance is the set of policies, processes, and tools that ensure ML models are developed, deployed, and maintained responsibly. As organizations scale from a handful of models to hundreds, governance prevents the chaos of untracked models making critical decisions without oversight.

Why Model Governance Matters

Without governance, organizations face:

  • Shadow models — Models deployed without approval or documentation
  • Regulatory risk — Non-compliance with EU AI Act, GDPR, or industry regulations
  • Operational risk — No one knows which model version is serving which endpoint
  • Reproducibility failure — Can't recreate a model's behavior for audit or debugging
  • Stale models — Models running in production long past their useful life

The Model Governance Framework

┌─────────────────────────────────────────────────────────────┐
│                   Model Governance Framework                  │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  Development          Review            Production            │
│  ┌──────────┐     ┌──────────┐      ┌──────────────┐        │
│  │ Experiment │────▶│ Model    │─────▶│ Deployment    │        │
│  │ Tracking   │     │ Review   │      │ + Monitoring  │        │
│  └──────────┘     └──────────┘      └──────┬───────┘        │
│       │                │                     │                │
│       ▼                ▼                     ▼                │
│  ┌──────────┐     ┌──────────┐      ┌──────────────┐        │
│  │ Model     │     │ Approval  │      │ Retirement    │        │
│  │ Card      │     │ Workflow  │      │ Planning      │        │
│  └──────────┘     └──────────┘      └──────────────┘        │
│                                                               │
│  Cross-cutting: Audit Trail • Risk Assessment • Documentation │
└─────────────────────────────────────────────────────────────┘

Model Cards

A model card is a standardized document that accompanies every model, providing transparency about its purpose, performance, limitations, and ethical considerations.

from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
 
class RiskLevel(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"
 
@dataclass
class ModelCard:
    """Standardized model documentation following Google's Model Cards framework."""
    # Identity
    model_name: str
    model_version: str
    model_id: str
    owner: str
    team: str
 
    # Purpose
    intended_use: str
    out_of_scope_uses: list[str]
    primary_users: list[str]
 
    # Training
    training_data_description: str
    training_data_version: str
    training_date: datetime
    training_pipeline: str
    hyperparameters: dict
 
    # Performance
    evaluation_metrics: dict[str, float]
    performance_by_group: dict[str, dict[str, float]] = field(default_factory=dict)
    known_limitations: list[str] = field(default_factory=list)
 
    # Risk & Ethics
    risk_level: RiskLevel = RiskLevel.MEDIUM
    ethical_considerations: list[str] = field(default_factory=list)
    fairness_assessment: dict = field(default_factory=dict)
 
    # Governance
    approved_by: str | None = None
    approval_date: datetime | None = None
    review_schedule: str = "quarterly"
    retirement_criteria: list[str] = field(default_factory=list)
 
    def to_markdown(self) -> str:
        """Generate human-readable model card."""
        sections = [
            f"# Model Card: {self.model_name} v{self.model_version}",
            f"\n## Overview",
            f"- **Owner**: {self.owner} ({self.team})",
            f"- **Risk Level**: {self.risk_level.value.upper()}",
            f"- **Approved By**: {self.approved_by or 'PENDING'}",
            f"- **Review Schedule**: {self.review_schedule}",
            f"\n## Intended Use",
            self.intended_use,
            f"\n### Out of Scope",
            *[f"- {use}" for use in self.out_of_scope_uses],
            f"\n## Performance Metrics",
            *[f"- **{k}**: {v:.4f}" for k, v in self.evaluation_metrics.items()],
            f"\n## Known Limitations",
            *[f"- {lim}" for lim in self.known_limitations],
            f"\n## Training Details",
            f"- **Data Version**: {self.training_data_version}",
            f"- **Trained**: {self.training_date.strftime('%Y-%m-%d')}",
            f"- **Pipeline**: {self.training_pipeline}",
        ]
        return "\n".join(sections)
 
# Example usage
card = ModelCard(
    model_name="customer-churn-predictor",
    model_version="2.4.1",
    model_id="churn-prod-v241",
    owner="ML Team",
    team="ml-platform",
    intended_use="Predict customer churn probability for proactive retention campaigns",
    out_of_scope_uses=[
        "Credit scoring or lending decisions",
        "Individual customer discrimination",
        "Automated account termination without human review",
    ],
    primary_users=["Customer Success Team", "Marketing Automation"],
    training_data_description="12 months of customer behavior data (purchases, support tickets, page views)",
    training_data_version="dvc:data/v2.4",
    training_date=datetime(2026, 2, 1),
    training_pipeline="kubeflow:nightly-churn-v2",
    hyperparameters={"n_estimators": 300, "max_depth": 12, "learning_rate": 0.05},
    evaluation_metrics={"accuracy": 0.931, "f1_weighted": 0.905, "auc_roc": 0.972},
    risk_level=RiskLevel.MEDIUM,
    known_limitations=[
        "Lower accuracy for customers with < 30 days history",
        "Not validated for B2B enterprise customers",
        "Performance may degrade during holiday seasons",
    ],
    retirement_criteria=[
        "Accuracy drops below 0.88 for 7 consecutive days",
        "Training data becomes older than 90 days",
        "Replacement model approved and deployed",
    ],
)

Approval Workflows

from enum import Enum
from datetime import datetime
 
class ApprovalStatus(Enum):
    PENDING = "pending"
    IN_REVIEW = "in_review"
    APPROVED = "approved"
    REJECTED = "rejected"
    REVOKED = "revoked"
 
class ModelApprovalWorkflow:
    """Multi-stage approval workflow for model promotion."""
 
    REQUIRED_APPROVALS = {
        "staging": ["ml_engineer"],
        "production": ["ml_lead", "risk_officer"],
        "high_risk_production": ["ml_lead", "risk_officer", "legal"],
    }
 
    def __init__(self, model_name: str, model_version: str, risk_level: RiskLevel):
        self.model_name = model_name
        self.model_version = model_version
        self.risk_level = risk_level
        self.approvals: list[dict] = []
        self.status = ApprovalStatus.PENDING
 
    def get_required_roles(self, target_stage: str) -> list[str]:
        if self.risk_level in (RiskLevel.HIGH, RiskLevel.CRITICAL) and target_stage == "production":
            return self.REQUIRED_APPROVALS["high_risk_production"]
        return self.REQUIRED_APPROVALS.get(target_stage, ["ml_engineer"])
 
    def submit_approval(self, approver: str, role: str, decision: str, notes: str = ""):
        self.approvals.append({
            "approver": approver,
            "role": role,
            "decision": decision,
            "notes": notes,
            "timestamp": datetime.utcnow().isoformat(),
        })
 
    def check_approval_status(self, target_stage: str) -> tuple[ApprovalStatus, str]:
        required = self.get_required_roles(target_stage)
        approved_roles = {a["role"] for a in self.approvals if a["decision"] == "approved"}
        rejected = any(a["decision"] == "rejected" for a in self.approvals)
 
        if rejected:
            return ApprovalStatus.REJECTED, "Model was rejected by a reviewer"
 
        missing = set(required) - approved_roles
        if not missing:
            return ApprovalStatus.APPROVED, "All required approvals received"
 
        return ApprovalStatus.IN_REVIEW, f"Awaiting approval from: {', '.join(missing)}"

Audit Trail

Every model action must be logged for regulatory compliance and debugging:

import json
from datetime import datetime
 
class ModelAuditLog:
    """Immutable audit trail for all model lifecycle events."""
 
    def __init__(self, storage_backend):
        self.storage = storage_backend
 
    def log_event(self, event_type: str, model_name: str, model_version: str,
                  actor: str, details: dict):
        event = {
            "event_id": self._generate_id(),
            "timestamp": datetime.utcnow().isoformat(),
            "event_type": event_type,
            "model_name": model_name,
            "model_version": model_version,
            "actor": actor,
            "details": details,
        }
        self.storage.append(event)
        return event
 
    def log_training(self, model_name: str, version: str, actor: str, **kwargs):
        return self.log_event("model.trained", model_name, version, actor, {
            "data_version": kwargs.get("data_version"),
            "metrics": kwargs.get("metrics"),
            "pipeline": kwargs.get("pipeline"),
        })
 
    def log_approval(self, model_name: str, version: str, approver: str, decision: str, notes: str = ""):
        return self.log_event("model.approval", model_name, version, approver, {
            "decision": decision,
            "notes": notes,
        })
 
    def log_deployment(self, model_name: str, version: str, actor: str, environment: str):
        return self.log_event("model.deployed", model_name, version, actor, {
            "environment": environment,
        })
 
    def log_retirement(self, model_name: str, version: str, actor: str, reason: str):
        return self.log_event("model.retired", model_name, version, actor, {
            "reason": reason,
        })
 
    def get_model_history(self, model_name: str) -> list[dict]:
        return self.storage.query(model_name=model_name)
 
    def _generate_id(self) -> str:
        import uuid
        return str(uuid.uuid4())

Model Lifecycle Management

class ModelLifecycleManager:
    """Manage models through their complete lifecycle."""
 
    LIFECYCLE_STAGES = ["development", "staging", "production", "deprecated", "retired"]
 
    def __init__(self, registry, monitor, audit_log):
        self.registry = registry
        self.monitor = monitor
        self.audit = audit_log
 
    def check_retirement_criteria(self, model_name: str) -> list[str]:
        """Check if a production model should be retired."""
        triggers = []
        model = self.registry.get_production_model(model_name)
 
        # Check age
        days_since_training = (datetime.utcnow() - model.training_date).days
        if days_since_training > 90:
            triggers.append(f"Model is {days_since_training} days old (limit: 90)")
 
        # Check performance
        current_metrics = self.monitor.get_current_metrics(model_name)
        if current_metrics.get("accuracy", 1.0) < model.retirement_threshold:
            triggers.append(
                f"Accuracy {current_metrics['accuracy']:.3f} below threshold {model.retirement_threshold}"
            )
 
        # Check drift
        drift_score = self.monitor.get_drift_score(model_name)
        if drift_score > 0.3:
            triggers.append(f"High data drift detected (score: {drift_score:.3f})")
 
        return triggers
 
    def retire_model(self, model_name: str, version: str, reason: str, actor: str):
        """Gracefully retire a model version."""
        # Move to deprecated first (allows rollback)
        self.registry.transition_stage(model_name, version, "deprecated")
        self.audit.log_event("model.deprecated", model_name, version, actor, {"reason": reason})
 
        # After grace period, fully retire
        # This is typically scheduled, not immediate
        return {"status": "deprecated", "reason": reason, "full_retirement": "scheduled_30d"}

EU AI Act Alignment

Model governance directly supports EU AI Act compliance:

| EU AI Act Requirement | Governance Implementation | |----------------------|--------------------------| | Risk classification (Art. 6) | Risk levels in model cards | | Technical documentation (Art. 11) | Model cards + audit trail | | Record keeping (Art. 12) | Immutable audit log | | Transparency (Art. 13) | Model cards with intended use | | Human oversight (Art. 14) | Approval workflows | | Accuracy and robustness (Art. 15) | Performance monitoring + retirement criteria |


Need model governance for your ML systems? DeviDevs implements governance frameworks aligned with EU AI Act requirements. Get a free assessment →

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.