Model Governance: Managing ML Models from Development to Retirement
Model governance is the set of policies, processes, and tools that ensure ML models are developed, deployed, and maintained responsibly. As organizations scale from a handful of models to hundreds, governance prevents the chaos of untracked models making critical decisions without oversight.
Why Model Governance Matters
Without governance, organizations face:
- Shadow models — Models deployed without approval or documentation
- Regulatory risk — Non-compliance with EU AI Act, GDPR, or industry regulations
- Operational risk — No one knows which model version is serving which endpoint
- Reproducibility failure — Can't recreate a model's behavior for audit or debugging
- Stale models — Models running in production long past their useful life
The Model Governance Framework
┌─────────────────────────────────────────────────────────────┐
│ Model Governance Framework │
├─────────────────────────────────────────────────────────────┤
│ │
│ Development Review Production │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Experiment │────▶│ Model │─────▶│ Deployment │ │
│ │ Tracking │ │ Review │ │ + Monitoring │ │
│ └──────────┘ └──────────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Model │ │ Approval │ │ Retirement │ │
│ │ Card │ │ Workflow │ │ Planning │ │
│ └──────────┘ └──────────┘ └──────────────┘ │
│ │
│ Cross-cutting: Audit Trail • Risk Assessment • Documentation │
└─────────────────────────────────────────────────────────────┘
Model Cards
A model card is a standardized document that accompanies every model, providing transparency about its purpose, performance, limitations, and ethical considerations.
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
class RiskLevel(Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
@dataclass
class ModelCard:
"""Standardized model documentation following Google's Model Cards framework."""
# Identity
model_name: str
model_version: str
model_id: str
owner: str
team: str
# Purpose
intended_use: str
out_of_scope_uses: list[str]
primary_users: list[str]
# Training
training_data_description: str
training_data_version: str
training_date: datetime
training_pipeline: str
hyperparameters: dict
# Performance
evaluation_metrics: dict[str, float]
performance_by_group: dict[str, dict[str, float]] = field(default_factory=dict)
known_limitations: list[str] = field(default_factory=list)
# Risk & Ethics
risk_level: RiskLevel = RiskLevel.MEDIUM
ethical_considerations: list[str] = field(default_factory=list)
fairness_assessment: dict = field(default_factory=dict)
# Governance
approved_by: str | None = None
approval_date: datetime | None = None
review_schedule: str = "quarterly"
retirement_criteria: list[str] = field(default_factory=list)
def to_markdown(self) -> str:
"""Generate human-readable model card."""
sections = [
f"# Model Card: {self.model_name} v{self.model_version}",
f"\n## Overview",
f"- **Owner**: {self.owner} ({self.team})",
f"- **Risk Level**: {self.risk_level.value.upper()}",
f"- **Approved By**: {self.approved_by or 'PENDING'}",
f"- **Review Schedule**: {self.review_schedule}",
f"\n## Intended Use",
self.intended_use,
f"\n### Out of Scope",
*[f"- {use}" for use in self.out_of_scope_uses],
f"\n## Performance Metrics",
*[f"- **{k}**: {v:.4f}" for k, v in self.evaluation_metrics.items()],
f"\n## Known Limitations",
*[f"- {lim}" for lim in self.known_limitations],
f"\n## Training Details",
f"- **Data Version**: {self.training_data_version}",
f"- **Trained**: {self.training_date.strftime('%Y-%m-%d')}",
f"- **Pipeline**: {self.training_pipeline}",
]
return "\n".join(sections)
# Example usage
card = ModelCard(
model_name="customer-churn-predictor",
model_version="2.4.1",
model_id="churn-prod-v241",
owner="ML Team",
team="ml-platform",
intended_use="Predict customer churn probability for proactive retention campaigns",
out_of_scope_uses=[
"Credit scoring or lending decisions",
"Individual customer discrimination",
"Automated account termination without human review",
],
primary_users=["Customer Success Team", "Marketing Automation"],
training_data_description="12 months of customer behavior data (purchases, support tickets, page views)",
training_data_version="dvc:data/v2.4",
training_date=datetime(2026, 2, 1),
training_pipeline="kubeflow:nightly-churn-v2",
hyperparameters={"n_estimators": 300, "max_depth": 12, "learning_rate": 0.05},
evaluation_metrics={"accuracy": 0.931, "f1_weighted": 0.905, "auc_roc": 0.972},
risk_level=RiskLevel.MEDIUM,
known_limitations=[
"Lower accuracy for customers with < 30 days history",
"Not validated for B2B enterprise customers",
"Performance may degrade during holiday seasons",
],
retirement_criteria=[
"Accuracy drops below 0.88 for 7 consecutive days",
"Training data becomes older than 90 days",
"Replacement model approved and deployed",
],
)Approval Workflows
from enum import Enum
from datetime import datetime
class ApprovalStatus(Enum):
PENDING = "pending"
IN_REVIEW = "in_review"
APPROVED = "approved"
REJECTED = "rejected"
REVOKED = "revoked"
class ModelApprovalWorkflow:
"""Multi-stage approval workflow for model promotion."""
REQUIRED_APPROVALS = {
"staging": ["ml_engineer"],
"production": ["ml_lead", "risk_officer"],
"high_risk_production": ["ml_lead", "risk_officer", "legal"],
}
def __init__(self, model_name: str, model_version: str, risk_level: RiskLevel):
self.model_name = model_name
self.model_version = model_version
self.risk_level = risk_level
self.approvals: list[dict] = []
self.status = ApprovalStatus.PENDING
def get_required_roles(self, target_stage: str) -> list[str]:
if self.risk_level in (RiskLevel.HIGH, RiskLevel.CRITICAL) and target_stage == "production":
return self.REQUIRED_APPROVALS["high_risk_production"]
return self.REQUIRED_APPROVALS.get(target_stage, ["ml_engineer"])
def submit_approval(self, approver: str, role: str, decision: str, notes: str = ""):
self.approvals.append({
"approver": approver,
"role": role,
"decision": decision,
"notes": notes,
"timestamp": datetime.utcnow().isoformat(),
})
def check_approval_status(self, target_stage: str) -> tuple[ApprovalStatus, str]:
required = self.get_required_roles(target_stage)
approved_roles = {a["role"] for a in self.approvals if a["decision"] == "approved"}
rejected = any(a["decision"] == "rejected" for a in self.approvals)
if rejected:
return ApprovalStatus.REJECTED, "Model was rejected by a reviewer"
missing = set(required) - approved_roles
if not missing:
return ApprovalStatus.APPROVED, "All required approvals received"
return ApprovalStatus.IN_REVIEW, f"Awaiting approval from: {', '.join(missing)}"Audit Trail
Every model action must be logged for regulatory compliance and debugging:
import json
from datetime import datetime
class ModelAuditLog:
"""Immutable audit trail for all model lifecycle events."""
def __init__(self, storage_backend):
self.storage = storage_backend
def log_event(self, event_type: str, model_name: str, model_version: str,
actor: str, details: dict):
event = {
"event_id": self._generate_id(),
"timestamp": datetime.utcnow().isoformat(),
"event_type": event_type,
"model_name": model_name,
"model_version": model_version,
"actor": actor,
"details": details,
}
self.storage.append(event)
return event
def log_training(self, model_name: str, version: str, actor: str, **kwargs):
return self.log_event("model.trained", model_name, version, actor, {
"data_version": kwargs.get("data_version"),
"metrics": kwargs.get("metrics"),
"pipeline": kwargs.get("pipeline"),
})
def log_approval(self, model_name: str, version: str, approver: str, decision: str, notes: str = ""):
return self.log_event("model.approval", model_name, version, approver, {
"decision": decision,
"notes": notes,
})
def log_deployment(self, model_name: str, version: str, actor: str, environment: str):
return self.log_event("model.deployed", model_name, version, actor, {
"environment": environment,
})
def log_retirement(self, model_name: str, version: str, actor: str, reason: str):
return self.log_event("model.retired", model_name, version, actor, {
"reason": reason,
})
def get_model_history(self, model_name: str) -> list[dict]:
return self.storage.query(model_name=model_name)
def _generate_id(self) -> str:
import uuid
return str(uuid.uuid4())Model Lifecycle Management
class ModelLifecycleManager:
"""Manage models through their complete lifecycle."""
LIFECYCLE_STAGES = ["development", "staging", "production", "deprecated", "retired"]
def __init__(self, registry, monitor, audit_log):
self.registry = registry
self.monitor = monitor
self.audit = audit_log
def check_retirement_criteria(self, model_name: str) -> list[str]:
"""Check if a production model should be retired."""
triggers = []
model = self.registry.get_production_model(model_name)
# Check age
days_since_training = (datetime.utcnow() - model.training_date).days
if days_since_training > 90:
triggers.append(f"Model is {days_since_training} days old (limit: 90)")
# Check performance
current_metrics = self.monitor.get_current_metrics(model_name)
if current_metrics.get("accuracy", 1.0) < model.retirement_threshold:
triggers.append(
f"Accuracy {current_metrics['accuracy']:.3f} below threshold {model.retirement_threshold}"
)
# Check drift
drift_score = self.monitor.get_drift_score(model_name)
if drift_score > 0.3:
triggers.append(f"High data drift detected (score: {drift_score:.3f})")
return triggers
def retire_model(self, model_name: str, version: str, reason: str, actor: str):
"""Gracefully retire a model version."""
# Move to deprecated first (allows rollback)
self.registry.transition_stage(model_name, version, "deprecated")
self.audit.log_event("model.deprecated", model_name, version, actor, {"reason": reason})
# After grace period, fully retire
# This is typically scheduled, not immediate
return {"status": "deprecated", "reason": reason, "full_retirement": "scheduled_30d"}EU AI Act Alignment
Model governance directly supports EU AI Act compliance:
| EU AI Act Requirement | Governance Implementation | |----------------------|--------------------------| | Risk classification (Art. 6) | Risk levels in model cards | | Technical documentation (Art. 11) | Model cards + audit trail | | Record keeping (Art. 12) | Immutable audit log | | Transparency (Art. 13) | Model cards with intended use | | Human oversight (Art. 14) | Approval workflows | | Accuracy and robustness (Art. 15) | Performance monitoring + retirement criteria |
Related Resources
- What is MLOps? — Governance in the MLOps context
- MLflow model registry — Technical implementation of model versioning
- EU AI Act compliance — Full regulatory requirements
- Model monitoring — Monitoring feeds into governance decisions
- AI model versioning security — Security aspects of model management
Need model governance for your ML systems? DeviDevs implements governance frameworks aligned with EU AI Act requirements. Get a free assessment →