MLOps

Feature Store Architecture: Design Patterns for ML Feature Engineering

DeviDevs Team
8 min read
#feature store#feature engineering#MLOps#ML infrastructure#Feast#ML pipeline

Feature Store Architecture: Design Patterns for ML Feature Engineering

Feature stores solve one of the most common problems in production ML: the training-serving skew. When features are computed differently during training (batch, in notebooks) and serving (real-time, in production), model performance silently degrades. A feature store ensures consistency by providing a single source of truth for feature computation and storage.

The Training-Serving Skew Problem

Consider a churn prediction model that uses "average order value over last 30 days":

# During training (data scientist in notebook)
df["avg_order_value_30d"] = df.groupby("customer_id")["order_value"].transform(
    lambda x: x.rolling("30D").mean()
)
 
# During serving (engineer in production)
# SELECT AVG(order_value) FROM orders
# WHERE customer_id = ? AND created_at > NOW() - INTERVAL '30 days'

These two implementations will produce different results due to:

  • Different handling of edge cases (nulls, first 30 days)
  • Different timestamp semantics (end-of-day vs. point-in-time)
  • Different data sources (data lake snapshot vs. live database)

A feature store eliminates this by computing features once and serving them consistently.

Feature Store Architecture

A feature store has four core components:

┌─────────────────────────────────────────────────────────────┐
│                    Feature Store Architecture                  │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  Feature Definitions (Code)                                   │
│  ┌──────────────────────────────────────────────────────┐    │
│  │  feature_views.py — Declarative feature definitions    │    │
│  │  entities.py — Entity definitions (customer, product)  │    │
│  │  sources.py — Data source connections                  │    │
│  └──────────────────────────────────────────────────────┘    │
│                          │                                     │
│              ┌───────────┴───────────┐                        │
│              ▼                       ▼                        │
│  ┌───────────────────┐   ┌───────────────────┐               │
│  │   Offline Store    │   │   Online Store     │               │
│  │  (Training Data)   │   │  (Serving Data)    │               │
│  │                    │   │                    │               │
│  │  - BigQuery/S3     │   │  - Redis/DynamoDB  │               │
│  │  - Full history    │   │  - Latest values   │               │
│  │  - Batch reads     │   │  - Low latency     │               │
│  │  - Point-in-time   │   │  - < 10ms reads    │               │
│  └───────────────────┘   └───────────────────┘               │
│              │                       │                        │
│              ▼                       ▼                        │
│  ┌───────────────────┐   ┌───────────────────┐               │
│  │  Training Pipeline │   │  Serving Endpoint  │               │
│  │  (Kubeflow/Airflow)│   │  (FastAPI/gRPC)    │               │
│  └───────────────────┘   └───────────────────┘               │
└─────────────────────────────────────────────────────────────┘

Offline Store

  • Stores the full history of feature values
  • Used for training data generation with point-in-time correctness
  • Backed by data warehouses (BigQuery, Snowflake) or object storage (S3/Parquet)
  • Batch reads, high throughput, higher latency acceptable

Online Store

  • Stores the latest feature values for each entity
  • Used for real-time serving (model inference)
  • Backed by low-latency key-value stores (Redis, DynamoDB)
  • Single-digit millisecond reads required

Implementing with Feast

Feast is the most popular open-source feature store. Here's a complete implementation:

Define Entities and Sources

# features/entities.py
from feast import Entity, ValueType
 
customer = Entity(
    name="customer_id",
    value_type=ValueType.INT64,
    description="Unique customer identifier",
)
 
product = Entity(
    name="product_id",
    value_type=ValueType.STRING,
    description="Product SKU",
)
# features/sources.py
from feast import FileSource, BigQuerySource
from datetime import timedelta
 
# For development: file-based source
customer_source_file = FileSource(
    path="data/customer_features.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_timestamp",
)
 
# For production: BigQuery source
customer_source_bq = BigQuerySource(
    table="ml_features.customer_daily_features",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_timestamp",
)
 
# Real-time source for streaming features
transaction_source = FileSource(
    path="data/transaction_features.parquet",
    timestamp_field="event_timestamp",
)

Define Feature Views

# features/feature_views.py
from feast import FeatureView, Feature, ValueType
from datetime import timedelta
from features.entities import customer, product
from features.sources import customer_source_bq, transaction_source
 
customer_profile_features = FeatureView(
    name="customer_profile",
    entities=[customer],
    ttl=timedelta(days=1),
    features=[
        Feature(name="account_age_days", dtype=ValueType.INT64),
        Feature(name="total_orders", dtype=ValueType.INT64),
        Feature(name="lifetime_value", dtype=ValueType.DOUBLE),
        Feature(name="preferred_category", dtype=ValueType.STRING),
        Feature(name="email_verified", dtype=ValueType.BOOL),
    ],
    online=True,
    source=customer_source_bq,
    tags={"team": "ml-platform", "domain": "customer"},
)
 
customer_behavioral_features = FeatureView(
    name="customer_behavior",
    entities=[customer],
    ttl=timedelta(hours=6),  # More frequently updated
    features=[
        Feature(name="purchases_last_7d", dtype=ValueType.INT64),
        Feature(name="purchases_last_30d", dtype=ValueType.INT64),
        Feature(name="avg_order_value_30d", dtype=ValueType.DOUBLE),
        Feature(name="days_since_last_purchase", dtype=ValueType.INT64),
        Feature(name="cart_abandonment_rate_30d", dtype=ValueType.DOUBLE),
        Feature(name="support_tickets_last_30d", dtype=ValueType.INT64),
        Feature(name="page_views_last_7d", dtype=ValueType.INT64),
    ],
    online=True,
    source=customer_source_bq,
    tags={"team": "ml-platform", "domain": "customer"},
)

Apply and Materialize

# Apply feature definitions
feast apply
 
# Materialize features to the online store
feast materialize 2026-01-01T00:00:00 2026-02-05T00:00:00
 
# Incremental materialization (run daily)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Training: Point-in-Time Feature Retrieval

from feast import FeatureStore
import pandas as pd
 
store = FeatureStore(repo_path="features/")
 
# Training entity DataFrame with timestamps
entity_df = pd.DataFrame({
    "customer_id": [1001, 1002, 1003, 1001, 1002],
    "event_timestamp": pd.to_datetime([
        "2026-01-15", "2026-01-15", "2026-01-15",
        "2026-02-01", "2026-02-01",
    ]),
    "label": [1, 0, 0, 1, 1],  # churn labels
})
 
# Get historical features with point-in-time correctness
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "customer_profile:account_age_days",
        "customer_profile:total_orders",
        "customer_profile:lifetime_value",
        "customer_behavior:purchases_last_30d",
        "customer_behavior:avg_order_value_30d",
        "customer_behavior:days_since_last_purchase",
        "customer_behavior:cart_abandonment_rate_30d",
    ],
).to_df()
 
print(training_df.head())
# customer_id  event_timestamp  label  account_age_days  total_orders  ...
# 1001         2026-01-15       1      365               23            ...
# 1002         2026-01-15       0      180               8             ...

Point-in-time correctness means: for customer 1001 on 2026-01-15, the feature store returns the feature values as they were on that date — not the latest values. This prevents data leakage (using future information to predict past events).

Serving: Online Feature Retrieval

from feast import FeatureStore
 
store = FeatureStore(repo_path="features/")
 
# Get latest features for real-time prediction
online_features = store.get_online_features(
    features=[
        "customer_profile:account_age_days",
        "customer_profile:total_orders",
        "customer_profile:lifetime_value",
        "customer_behavior:purchases_last_30d",
        "customer_behavior:avg_order_value_30d",
        "customer_behavior:days_since_last_purchase",
    ],
    entity_rows=[{"customer_id": 1001}],
).to_dict()
 
print(online_features)
# {
#   "customer_id": [1001],
#   "account_age_days": [380],
#   "total_orders": [27],
#   "lifetime_value": [4523.50],
#   "purchases_last_30d": [3],
#   "avg_order_value_30d": [167.88],
#   "days_since_last_purchase": [5],
# }

Integration with Model Serving

from fastapi import FastAPI
from feast import FeatureStore
import mlflow.pyfunc
 
app = FastAPI()
feature_store = FeatureStore(repo_path="features/")
model = mlflow.pyfunc.load_model("models:/churn-predictor/Production")
 
FEATURE_LIST = [
    "customer_profile:account_age_days",
    "customer_profile:total_orders",
    "customer_profile:lifetime_value",
    "customer_behavior:purchases_last_30d",
    "customer_behavior:avg_order_value_30d",
    "customer_behavior:days_since_last_purchase",
    "customer_behavior:cart_abandonment_rate_30d",
]
 
@app.post("/predict/churn")
async def predict_churn(customer_id: int):
    # Fetch features from online store
    features = feature_store.get_online_features(
        features=FEATURE_LIST,
        entity_rows=[{"customer_id": customer_id}],
    ).to_df()
 
    # Run prediction
    prediction = model.predict(features.drop(columns=["customer_id"]))
 
    return {
        "customer_id": customer_id,
        "churn_probability": float(prediction[0]),
        "features_used": features.to_dict(orient="records")[0],
    }

Design Pattern: On-Demand Features

Some features need to be computed at request time (e.g., "time since last page view" changes every second). Use on-demand feature views:

from feast import on_demand_feature_view, Field
from feast.types import Float64
 
@on_demand_feature_view(
    sources=[customer_behavioral_features],
    schema=[
        Field(name="recency_score", dtype=Float64),
        Field(name="activity_score", dtype=Float64),
    ],
)
def customer_scores(inputs: pd.DataFrame) -> pd.DataFrame:
    """Compute derived features on demand."""
    df = pd.DataFrame()
    # Recency score: 0-1, higher = more recent
    df["recency_score"] = 1.0 / (1.0 + inputs["days_since_last_purchase"] / 30.0)
    # Activity score: combines multiple signals
    df["activity_score"] = (
        inputs["purchases_last_7d"] * 0.4
        + inputs["page_views_last_7d"] * 0.01
        + (1 - inputs["cart_abandonment_rate_30d"]) * 0.3
    )
    return df

Design Pattern: Streaming Features

For real-time features that update with every event (e.g., running count of purchases today):

# Kafka → Flink/Spark Streaming → Online Store
 
# Flink job to compute streaming features
from pyflink.table import StreamTableEnvironment
 
t_env = StreamTableEnvironment.create()
 
# Read from Kafka
t_env.execute_sql("""
    CREATE TABLE transactions (
        customer_id BIGINT,
        order_value DOUBLE,
        event_time TIMESTAMP(3),
        WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
    ) WITH (
        'connector' = 'kafka',
        'topic' = 'transactions',
        'format' = 'json'
    )
""")
 
# Compute streaming features
t_env.execute_sql("""
    INSERT INTO feature_store_sink
    SELECT
        customer_id,
        COUNT(*) OVER w AS purchases_today,
        SUM(order_value) OVER w AS spend_today,
        AVG(order_value) OVER w AS avg_order_today
    FROM transactions
    WINDOW w AS (
        PARTITION BY customer_id
        ORDER BY event_time
        RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
    )
""")

Custom Feature Store Implementation

For teams that need more control or have simpler requirements, here's a minimal custom feature store:

from datetime import datetime
from typing import Any
import redis
import pyarrow.parquet as pq
 
class SimpleFeatureStore:
    """Minimal feature store with offline (Parquet) + online (Redis) stores."""
 
    def __init__(self, offline_path: str, redis_url: str):
        self.offline_path = offline_path
        self.redis = redis.from_url(redis_url)
 
    def materialize(self, feature_view: str, df):
        """Write features to both offline and online stores."""
        # Offline: append to Parquet
        path = f"{self.offline_path}/{feature_view}"
        pq.write_to_dataset(df.to_arrow(), path, partition_cols=["date"])
 
        # Online: write latest values to Redis
        for _, row in df.iterrows():
            entity_key = f"{feature_view}:{row['entity_id']}"
            features = row.drop(["entity_id", "event_timestamp", "date"]).to_dict()
            self.redis.hset(entity_key, mapping={k: str(v) for k, v in features.items()})
            self.redis.expire(entity_key, 86400 * 7)  # 7-day TTL
 
    def get_online_features(self, feature_view: str, entity_id: str) -> dict[str, Any]:
        """Fetch latest features from Redis."""
        key = f"{feature_view}:{entity_id}"
        raw = self.redis.hgetall(key)
        return {k.decode(): float(v.decode()) for k, v in raw.items()}
 
    def get_historical_features(self, feature_view: str, entity_df) -> Any:
        """Point-in-time join from offline store."""
        import pandas as pd
        features_df = pd.read_parquet(f"{self.offline_path}/{feature_view}")
        # Point-in-time join: for each entity+timestamp, get the most recent feature values
        merged = pd.merge_asof(
            entity_df.sort_values("event_timestamp"),
            features_df.sort_values("event_timestamp"),
            on="event_timestamp",
            by="entity_id",
            direction="backward",
        )
        return merged

Feature Store Comparison

| Feature Store | Type | Online Store | Offline Store | Streaming | Best For | |---------------|------|-------------|--------------|-----------|----------| | Feast | Open source | Redis, DynamoDB | BigQuery, S3 | Kafka → push | Self-hosted, any cloud | | Tecton | Managed | DynamoDB | S3/Snowflake | Native | Enterprise, real-time | | Hopsworks | Open source | RonDB | Hudi | Flink | On-prem, all-in-one | | Databricks | Managed | DynamoDB | Delta Lake | Spark | Databricks users | | Vertex AI | Managed | Bigtable | BigQuery | Dataflow | GCP users |

When Do You Need a Feature Store?

You need one if:

  • Multiple models share the same features
  • Training and serving use different data sources
  • You need point-in-time correctness for training
  • Feature computation is expensive and should be cached
  • You need real-time features with < 50ms latency

You probably don't need one if:

  • You have a single model in production
  • Features are simple transformations of raw input
  • All models are batch (no real-time serving)
  • Your team is < 3 ML engineers

Next Steps

  • MLflow — Track experiments and manage models alongside your feature store
  • Kubeflow Pipelines — Orchestrate feature computation and training pipelines
  • MLOps best practices — Integrate feature stores into your ML workflow
  • What is MLOps? — Understand where feature stores fit in the MLOps lifecycle

Building a feature store for your ML platform? DeviDevs designs and implements production feature stores with Feast, custom solutions, or managed platforms. Get a free assessment →

Weekly AI Security & Automation Digest

Get the latest on AI Security, workflow automation, secure integrations, and custom platform development delivered weekly.

No spam. Unsubscribe anytime.