Feature Store: The Missing Infrastructure for ML Engineering
If you've ever trained a model in a notebook and then struggled to serve the same features in production, you've felt the pain that feature stores exist to solve.
A feature store is a centralized platform for managing, storing, and serving machine learning features. It bridges the gap between data engineering and model serving, ensuring that the features used during training are identical to those used during inference.
What Problem Does a Feature Store Solve?#
Without a feature store, ML teams typically face:
- Training-serving skew — features computed differently in batch training vs. real-time serving.
- Duplicated feature logic — multiple teams rewriting the same transformations.
- No discoverability — no catalog of available features across the organization.
- Inconsistent data — different models consuming different versions of the same signal.
A feature store addresses all of these by providing a single source of truth for feature definitions, storage, and retrieval.
Online Store vs. Offline Store#
Feature stores typically have two layers:
Offline Store#
The offline store holds historical feature values used for training. It's backed by columnar storage like Parquet files, BigQuery, Redshift, or a data lake. Queries against the offline store return large datasets spanning weeks or months of data.
Online Store#
The online store serves the latest feature values at low latency for real-time inference. It's backed by key-value stores like Redis, DynamoDB, or Bigtable. When your model needs to score a request in under 50ms, the online store is what makes that possible.
The feature store keeps both layers synchronized — features are computed in batch or streaming pipelines and materialized to both stores.
Feature Engineering Pipelines#
Feature stores don't replace your feature engineering — they organize it. A typical pipeline looks like this:
- Raw data arrives in a warehouse or stream.
- Transformation logic computes features (aggregations, joins, encodings).
- Materialization writes computed features to the online and offline stores.
- Serving retrieves features by entity key at inference time.
Pipelines can be batch (scheduled Spark or SQL jobs) or streaming (Flink, Spark Structured Streaming) depending on freshness requirements.
Point-in-Time Correctness#
This is one of the most critical and underappreciated concepts in ML engineering.
When you join features to training labels, you must ensure that feature values reflect only information available at the time the label was observed. Using future data — even accidentally — introduces data leakage and produces models that perform well in backtests but fail in production.
Feature stores handle point-in-time joins automatically. You provide a set of entities with timestamps, and the store returns the most recent feature values as of each timestamp — never looking ahead.
Without this guarantee, debugging model performance degradation becomes nearly impossible.
Feature Serving Latency#
For online inference, latency matters. A well-configured online store serves features in single-digit milliseconds. Key factors that affect latency:
- Storage backend — Redis and Memorystore are faster than DynamoDB for p99 latency.
- Feature vector size — fewer features per lookup means faster responses.
- Batch retrieval — fetching features for multiple entities in one call reduces round trips.
- Caching — application-level caching can further reduce store hits.
If your model scoring budget is 100ms end-to-end, you typically want feature retrieval under 10ms.
Feature Reuse Across Models#
One of the biggest ROI drivers of a feature store is feature reuse. Once a team defines and materializes user_avg_purchase_last_30d, any model in the organization can consume it without rewriting the transformation.
This creates a flywheel effect: as more features are registered, new models become cheaper to build because they can compose existing features rather than starting from raw data.
A good feature store includes a feature catalog with metadata — descriptions, owners, data types, freshness SLAs, and lineage information.
Schema Management#
Features evolve over time. Schema management ensures that changes don't silently break downstream models:
- Type enforcement — a feature defined as
floatshould reject string values. - Schema versioning — track changes to feature definitions over time.
- Compatibility checks — warn when a feature transformation changes in a way that could affect model behavior.
- Deprecation policies — mark features as deprecated and notify consumers before removal.
Tools and Platforms#
Feast#
The most popular open-source feature store. Feast provides a Python SDK, supports multiple backends (BigQuery, Redshift, Snowflake for offline; Redis, DynamoDB for online), and integrates with orchestrators like Airflow. It's lightweight and works well for teams getting started.
Tecton#
A managed feature platform built by the creators of Uber's Michelangelo. Tecton handles real-time feature engineering with built-in streaming support, automatic backfills, and monitoring. Best suited for organizations with demanding real-time requirements.
Hopsworks#
An open-source platform that combines a feature store with ML pipelines. Hopsworks emphasizes the feature engineering workflow and includes a UI for feature exploration and monitoring.
Vertex AI Feature Store#
Google Cloud's managed offering. Tightly integrated with BigQuery and Vertex AI pipelines. A natural fit for teams already on GCP who want minimal operational overhead.
Comparison at a Glance#
| Capability | Feast | Tecton | Hopsworks | Vertex AI |
|---|---|---|---|---|
| Open source | Yes | No | Yes | No |
| Streaming features | Limited | Native | Yes | Yes |
| Managed option | No | Yes | Yes | Yes |
| Point-in-time joins | Yes | Yes | Yes | Yes |
When Do You Need a Feature Store?#
Not every ML team needs one on day one. Consider a feature store when:
- Multiple models share the same input signals.
- Training-serving skew is causing production issues.
- Feature engineering is being duplicated across teams.
- You need sub-100ms feature serving for real-time models.
- Compliance requires feature lineage and auditability.
If you're running a single model with batch predictions, a well-organized data pipeline may suffice. But as your ML footprint grows, a feature store becomes essential infrastructure.
Key Takeaways#
- A feature store is a centralized system for storing, managing, and serving ML features.
- The offline store supports training; the online store supports low-latency inference.
- Point-in-time correctness prevents data leakage in training datasets.
- Feature reuse across models accelerates development and reduces duplication.
- Tools like Feast, Tecton, Hopsworks, and Vertex AI Feature Store cover a range of needs from open-source to fully managed.
Feature stores aren't glamorous, but they're the kind of infrastructure that separates ML experiments from ML systems.
Build smarter ML infrastructure with Codelit.
This is post #171 in the Codelit engineering blog series.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Feature Store in seconds.
Try it in Codelit →
Comments