ML Experiment Tracking: Cele Mai Bune Practici pentru Machine Learning Reproductibil
Experiment tracking este practica MLOps cu cel mai mare impact pe care o poti adopta. Nu costa aproape nimic de implementat, imbunatateste imediat reproductibilitatea si formeaza fundatia pentru orice alta capabilitate MLOps.
Ce sa urmaresti
Setul Minim Viabil de Tracking
Fiecare rulare de antrenare ar trebui sa captureze:
import mlflow
from datetime import datetime
import platform
import sys
def track_experiment(model, params, metrics, data_info, tags=None):
"""Sablon complet de experiment tracking."""
with mlflow.start_run():
# 1. Parametri (ce ai configurat)
mlflow.log_params(params)
# 2. Metrici (ce ai masurat)
mlflow.log_metrics(metrics)
# 3. Mediu (unde a rulat)
mlflow.set_tag("python_version", sys.version)
mlflow.set_tag("platform", platform.platform())
mlflow.set_tag("gpu_available", str(torch.cuda.is_available()) if "torch" in sys.modules else "N/A")
# 4. Date (pe ce s-a antrenat)
mlflow.set_tag("data_version", data_info["version"])
mlflow.set_tag("data_rows", str(data_info["rows"]))
mlflow.set_tag("data_features", str(data_info["features"]))
# 5. Tag-uri custom
if tags:
for key, value in tags.items():
mlflow.set_tag(key, value)
# 6. Artefact model
mlflow.sklearn.log_model(model, "model")
# Utilizare
track_experiment(
model=trained_model,
params={"n_estimators": 200, "max_depth": 12, "learning_rate": 0.05},
metrics={"accuracy": 0.931, "f1": 0.905, "auc_roc": 0.972, "train_time_seconds": 45.2},
data_info={"version": "v2.4", "rows": 50000, "features": 24},
tags={"author": "petru", "experiment_type": "hyperparameter_sweep"},
)Ce uita lumea sa logheze
| Omis frecvent | De ce conteaza | |-------------|---------------| | Random seed | Reproductibilitate | | Pasi de preprocesare date | Consistenta features | | Metoda de split train/test | Comparatie corecta | | Versiunea datasetului de evaluare | Comparabilitate metrici | | Durata antrenare | Estimare costuri | | Utilizare GPU | Dimensionare infrastructura | | Rulari esuate | Invatare din greseli | | Strategia de tratare null | Intelegerea calitatii datelor |
Organizarea experimentelor
Conventii de denumire
# Rau: Experimente fara nume sau generice
mlflow.set_experiment("test")
mlflow.set_experiment("experiment_1")
# Bine: Denumire structurata
mlflow.set_experiment("churn-prediction/feature-expansion-v2")
mlflow.set_experiment("pricing-model/gpu-optimization")
# Pattern: {nume-model}/{scopul-experimentului}Strategia de tagging
# Tag-uri standard pentru fiecare rulare
STANDARD_TAGS = {
"team": "ml-platform",
"use_case": "customer-churn",
"stage": "development", # development | staging | production
"data_source": "data-warehouse",
"trigger": "manual", # manual | scheduled | drift-triggered
}
# Tag-uri specifice rularii
mlflow.set_tags(STANDARD_TAGS)
mlflow.set_tag("hypothesis", "Adding recency features improves churn prediction")
mlflow.set_tag("outcome", "confirmed: +2.1% F1 improvement")Rulari imbricate pentru Hyperparameter Sweeps
with mlflow.start_run(run_name="hp-sweep-2026-02-18"):
mlflow.set_tag("sweep_method", "bayesian")
mlflow.log_param("total_trials", 50)
for trial in optimizer.get_trials(50):
with mlflow.start_run(run_name=f"trial-{trial.number}", nested=True):
mlflow.log_params(trial.params)
model = train_with_params(trial.params)
metrics = evaluate(model)
mlflow.log_metrics(metrics)
# Logheaza cel mai bun rezultat la nivelul parinte
best = optimizer.best_trial
mlflow.log_params({f"best_{k}": v for k, v in best.params.items()})
mlflow.log_metrics({f"best_{k}": v for k, v in best.metrics.items()})Workflow-uri de comparatie
Comparatie A/B de modele
def compare_models(run_ids: list[str], metric: str = "f1_weighted") -> dict:
"""Compara mai multe rulari de experiment pe o metrica specifica."""
client = mlflow.tracking.MlflowClient()
results = []
for run_id in run_ids:
run = client.get_run(run_id)
results.append({
"run_id": run_id,
"run_name": run.info.run_name,
"params": run.data.params,
"metrics": run.data.metrics,
"primary_metric": run.data.metrics.get(metric, 0),
})
results.sort(key=lambda x: x["primary_metric"], reverse=True)
return {
"best_run": results[0],
"all_runs": results,
"metric_used": metric,
"improvement": results[0]["primary_metric"] - results[-1]["primary_metric"],
}Pattern-uri de colaborare in echipa
Checklist de Review Experiment
Inainte de a promova un model din experiment in staging, verifica:
## Checklist Review Experiment
### Date
- [ ] Versiunea datelor de antrenare documentata
- [ ] Validarea datelor trecuta
- [ ] Fara scurgere de date intre train/test
- [ ] Distributia claselor este reprezentativa
### Model
- [ ] Hiperparametri documentati
- [ ] Performanta atinge pragurile minime
- [ ] Fara regresie fata de productia curenta
- [ ] Verificari bias/fairness trecute
- [ ] Dimensiunea modelului in limite
### Reproductibilitate
- [ ] Random seed setat si logat
- [ ] Mediu capturat (Python, pachete)
- [ ] Pipeline-ul poate reproduce rezultatele de la zero
- [ ] Cod comis si taguitAvansat: Logarea metricilor custom
Logarea curbelor si graficelor
import mlflow
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, precision_recall_curve, confusion_matrix
import numpy as np
def log_evaluation_plots(y_true, y_pred, y_proba):
"""Logheaza grafice de evaluare ca artefacte MLflow."""
# Curba ROC
fpr, tpr, _ = roc_curve(y_true, y_proba)
fig, ax = plt.subplots()
ax.plot(fpr, tpr)
ax.set_xlabel("False Positive Rate")
ax.set_ylabel("True Positive Rate")
ax.set_title("ROC Curve")
mlflow.log_figure(fig, "plots/roc_curve.png")
plt.close()
# Matrice de confuzie
cm = confusion_matrix(y_true, y_pred)
fig, ax = plt.subplots()
ax.imshow(cm, cmap="Blues")
for i in range(cm.shape[0]):
for j in range(cm.shape[1]):
ax.text(j, i, str(cm[i, j]), ha="center", va="center")
ax.set_xlabel("Predicted")
ax.set_ylabel("Actual")
ax.set_title("Confusion Matrix")
mlflow.log_figure(fig, "plots/confusion_matrix.png")
plt.close()
# Feature Importance (daca este disponibil)
# mlflow.log_artifact("feature_importance.json")Metrici la nivel de pas pentru curbele de antrenare
for epoch in range(num_epochs):
train_loss = train_one_epoch(model, train_loader)
val_loss, val_acc = evaluate(model, val_loader)
# Logheaza la fiecare pas, creeaza curbe de antrenare in UI-ul MLflow
mlflow.log_metric("train_loss", train_loss, step=epoch)
mlflow.log_metric("val_loss", val_loss, step=epoch)
mlflow.log_metric("val_accuracy", val_acc, step=epoch)
mlflow.log_metric("learning_rate", optimizer.param_groups[0]["lr"], step=epoch)Anti-pattern-uri in Experiment Tracking
| Anti-pattern | Problema | Abordare mai buna | |-------------|---------|----------------| | Urmaresti doar rularile de succes | Nu poti invata din esecuri | Urmareste totul, tagheaza esecurile | | Loghezi 100+ metrici pe rulare | Zgomotul acopera semnalul | Concentreaza-te pe 5-10 metrici cheie | | Fara conventie de denumire | Nu gasesti experimentele | Foloseste nume structurate / | | Logare in fisiere locale | Nu sunt partajate, se pierd usor | Foloseste un server de tracking (MLflow/W&B) | | Logare post-hoc | Metadatele sunt gresite/incomplete | Logheaza in timpul antrenarii, nu dupa |
Resurse conexe
- Tutorial MLflow: Setup si utilizare practica
- Cele mai bune practici MLOps: Unde se incadreaza tracking-ul in workflow
- Ce este MLOps?: Concepte fundamentale
- Comparatie instrumente MLOps: Alegerea instrumentului potrivit de tracking
Ai nevoie sa configurezi experiment tracking pentru echipa ta? DeviDevs implementeaza platforme MLOps de productie cu tracking si colaborare complete. Obtine o evaluare gratuita ->
Sistemul tau AI e conform cu EU AI Act? Evaluare gratuita de risc - afla in 2 minute →