MLOps

MLOps for Small Teams: Building ML Infrastructure Without the Complexity

DeviDevs Team
6 min read
#MLOps#small teams#ML infrastructure#startup ML#practical MLOps#lean MLOps

MLOps for Small Teams: Building ML Infrastructure Without the Complexity

You don't need Kubeflow, a feature store, and a dedicated ML platform team to do MLOps well. Small teams with 1-5 ML engineers can build production-grade ML systems with a fraction of the infrastructure — if they choose the right tools and priorities.

The Minimal Viable MLOps Stack

Cost: $0-200/month (infrastructure only, all open source)

┌─────────────────────────────────────────────────────┐
│            Minimal Viable MLOps Stack                  │
├─────────────────────────────────────────────────────┤
│                                                       │
│  Git (code) + DVC (data) → MLflow (tracking)         │
│       │                         │                     │
│       ▼                         ▼                     │
│  GitHub Actions (CI/CD)    Model Registry (MLflow)    │
│       │                         │                     │
│       ▼                         ▼                     │
│  pytest (testing)          FastAPI (serving)          │
│                                 │                     │
│                                 ▼                     │
│                          Evidently (monitoring)       │
│                                                       │
│  Storage: S3 or GCS ($5-50/month)                    │
│  Compute: GitHub Actions (free tier) + cloud GPU     │
└─────────────────────────────────────────────────────┘

Week 1: Foundation (Day 1-5)

Day 1: Experiment Tracking

The single highest-impact practice. Set up MLflow in 10 minutes:

pip install mlflow
mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./artifacts --port 5000

Add autologging to your training code:

import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("my-first-experiment")
mlflow.autolog()
 
# Your existing training code works unchanged — autolog captures everything
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

Day 2-3: Data Versioning

pip install dvc[s3]
dvc init
dvc remote add -d storage s3://my-ml-data/dvc
 
# Track your training data
dvc add data/training.parquet
git add data/training.parquet.dvc .gitignore
git commit -m "Track training data v1"
dvc push

Day 4-5: Basic CI/CD

# .github/workflows/ml.yml
name: ML Pipeline
on:
  push:
    branches: [main]
    paths: ['src/**', 'configs/**']
 
jobs:
  test-and-train:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.11', cache: 'pip' }
      - run: pip install -r requirements.txt
      - run: pytest tests/ -v
      - run: python src/train.py --config configs/production.yaml
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}

Result after Week 1: Every experiment is tracked, data is versioned, and training runs automatically on push. Total cost: $0.

Month 1: Production Path (Week 2-4)

Add Model Serving

# serve.py — Simple FastAPI server
from fastapi import FastAPI
import mlflow.pyfunc
import pandas as pd
 
app = FastAPI()
model = mlflow.pyfunc.load_model("models:/my-model/Production")
 
@app.post("/predict")
async def predict(features: dict):
    df = pd.DataFrame([features])
    return {"prediction": float(model.predict(df)[0])}

Deploy with a single command:

# Option A: Railway/Render (simplest)
# Push code → auto-deploys, free tier available
 
# Option B: Docker
docker build -t model-api .
docker run -p 8080:8080 model-api

Add Basic Monitoring

# Weekly monitoring script (run via cron/GitHub Actions)
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
import pandas as pd
 
reference = pd.read_parquet("data/training.parquet")
current = pd.read_parquet("data/production_last_week.parquet")
 
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=reference, current_data=current)
report.save_html("reports/drift_report.html")
 
# Check if retraining needed
results = report.as_dict()
if results["metrics"][0]["result"]["dataset_drift"]:
    print("DRIFT DETECTED — consider retraining")
    # Send Slack notification

Add Model Registry Promotion

# promote.py — Simple model promotion script
import mlflow
from mlflow.tracking import MlflowClient
 
client = MlflowClient()
 
def promote_best_model(experiment_name: str, metric: str = "f1", min_threshold: float = 0.85):
    experiment = client.get_experiment_by_name(experiment_name)
    runs = client.search_runs(experiment.experiment_id, order_by=[f"metrics.{metric} DESC"], max_results=1)
 
    if not runs:
        print("No runs found")
        return
 
    best_run = runs[0]
    best_metric = best_run.data.metrics.get(metric, 0)
 
    if best_metric < min_threshold:
        print(f"Best {metric}={best_metric:.3f} below threshold {min_threshold}")
        return
 
    model_uri = f"runs:/{best_run.info.run_id}/model"
    mv = mlflow.register_model(model_uri, "my-model")
    client.transition_model_version_stage("my-model", mv.version, "Production", archive_existing_versions=True)
    print(f"Promoted model v{mv.version} ({metric}={best_metric:.3f}) to Production")

Result after Month 1: Model served via API, basic monitoring, automated promotion. Total cost: ~$50/month (hosting).

Month 3: Scaling Up

When you outgrow the basics, add these incrementally:

When to Add a Feature Store

You need one when: Multiple models share features, or you have training-serving skew. You don't need one when: Single model, features computed from raw input.

# Simple alternative: Feature computation module (shared code)
# src/features.py — used by both training AND serving
def compute_features(raw_data: dict) -> dict:
    return {
        "recency_score": 1.0 / (1.0 + raw_data["days_since_last_purchase"] / 30),
        "frequency_score": min(raw_data["purchases_last_30d"] / 10, 1.0),
        "monetary_score": min(raw_data["avg_order_value"] / 100, 1.0),
    }

When to Add Kubeflow/Airflow

You need it when: Multiple pipeline stages with different dependencies, GPU scheduling, or complex DAGs. You don't need it when: A single training script handles everything end-to-end.

# Simple alternative: Makefile pipeline
# Makefile
.PHONY: pipeline
 
pipeline: validate features train evaluate promote
 
validate:
	python src/validate_data.py
 
features:
	python src/compute_features.py
 
train:
	python src/train.py --config configs/production.yaml
 
evaluate:
	python src/evaluate.py
 
promote:
	python src/promote.py

When to Add Auto-Retraining

You need it when: Model performance degrades monthly, or data changes frequently. You don't need it when: Model is retrained quarterly and performance is stable.

# GitHub Actions scheduled retraining
name: Weekly Retrain
on:
  schedule:
    - cron: '0 3 * * 0'  # Sunday 3 AM
jobs:
  retrain:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements.txt
      - run: make pipeline
      - name: Notify
        if: failure()
        run: curl -X POST $SLACK_WEBHOOK -d '{"text":"Retraining failed!"}'

Cost Comparison: Small Team vs. Enterprise

| Component | Small Team | Enterprise | |-----------|-----------|-----------| | Experiment tracking | MLflow (self-hosted) — $0 | W&B — $50/user/mo | | Orchestration | GitHub Actions — $0 | Kubeflow + K8s — $500+/mo | | Feature store | Shared Python module — $0 | Tecton — $1000+/mo | | Serving | Railway/Render — $7-25/mo | KServe + K8s — $200+/mo | | Monitoring | Evidently + cron — $0 | Arize — $500+/mo | | Storage | S3 — $5-50/mo | S3 + data lake — $100+/mo | | Total | $12-75/mo | $2,350+/mo |

Common Mistakes for Small Teams

  1. Starting with Kubernetes — You probably don't need it. Railway, Render, or a single VM works fine for most serving needs.

  2. Building a "platform" — You don't need a platform team until you have 5+ models in production.

  3. Choosing tools based on blog posts — Netflix built their ML platform for Netflix-scale problems. You have different problems. Start with what fits your team.

  4. Skipping experiment tracking — This is the one thing you should never skip, even as a solo ML engineer. mlflow.autolog() takes one line.

  5. Premature feature store — A shared Python module that both training and serving import is a perfectly valid "feature store" for a small team.

The Small Team MLOps Roadmap

Month 1:   MLflow + DVC + GitHub Actions + FastAPI = Production
Month 2:   Add Evidently monitoring + model promotion
Month 3:   Add scheduled retraining + alerting
Month 6:   Evaluate: Do we need Kubeflow/Feature store?
Month 12:  Scale only what's bottlenecking you

Building ML with a small team? DeviDevs helps teams of all sizes build right-sized MLOps infrastructure. No over-engineering, just what you need. Get a free assessment →

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.