recommendation enginemachine learningarchitecturesystem designdata engineering

Recommendation Engine Architecture: From Collaborative Filtering to Deep Learning

March 28, 2026 7 min readBy Codelit Team Discussion

Recommendation Engine Architecture#

Recommendations drive 35% of Amazon's revenue and 80% of Netflix watches. Behind every "you might also like" is a system balancing relevance, diversity, freshness, and latency.

Why Recommendations Matter#

Without recommendations:
  User searches → browses → maybe finds something → high bounce rate

With recommendations:
  User arrives → personalized feed → discovers items → longer sessions
  Result: 10-30% increase in engagement and conversion

The difference between a mediocre and great recommendation engine is architecture, not just algorithms.

Collaborative Filtering#

The most intuitive approach: people who liked similar things will like similar things.

User-Based Collaborative Filtering#

Find users similar to the target user, then recommend what those similar users liked.

# Simplified user-based CF
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# User-item interaction matrix (ratings)
# Rows = users, Columns = items
ratings = np.array([
    [5, 3, 0, 1],  # User A
    [4, 0, 0, 1],  # User B
    [1, 1, 0, 5],  # User C
    [0, 0, 5, 4],  # User D
])

# Compute user similarity
user_sim = cosine_similarity(ratings)

# For User B, find most similar user → User A
# Recommend items User A rated highly that User B hasn't seen
# → Recommend Item 2 (rating 3 from User A)

Pros: No item metadata needed, captures complex patterns. Cons: Cold start for new users, doesn't scale well with millions of users.

Item-Based Collaborative Filtering#

Find items similar to what the user already liked. Amazon popularized this approach.

# Item-based CF — compute item similarity
item_sim = cosine_similarity(ratings.T)

# User liked Item 1 → find items most similar to Item 1
# Similarity scores tell us which items co-occur in ratings
# More stable than user-based (items change less than user behavior)

Why item-based often wins: Item relationships are more stable than user relationships. A user's taste shifts; the similarity between two movies doesn't.

Content-Based Filtering#

Recommend items with features similar to what the user previously liked.

# Content-based: use item features
item_features = {
    "movie_1": {"genre": "sci-fi", "director": "Nolan", "year": 2014},
    "movie_2": {"genre": "sci-fi", "director": "Villeneuve", "year": 2021},
    "movie_3": {"genre": "comedy", "director": "Gerwig", "year": 2023},
}

# User watched movie_1 and liked it
# TF-IDF or embedding on features → movie_2 is most similar
# Recommend movie_2

Pros: No cold start for items (features available immediately), transparent reasoning. Cons: Over-specialization (filter bubble), can't discover surprising recommendations.

Hybrid Approaches#

Production systems combine multiple strategies:

┌─────────────────────────────────────────────┐
│            Hybrid Recommender               │
├─────────────────────────────────────────────┤
│                                             │
│  Collaborative ──┐                          │
│  Filtering       ├──→ Weighted    ──→ Final │
│                  │    Combination     List   │
│  Content-Based ──┤                          │
│  Filtering       │                          │
│                  │                          │
│  Popularity    ──┘                          │
│  Baseline                                   │
│                                             │
└─────────────────────────────────────────────┘

Common hybrid strategies:

Weighted: Score = 0.6 * CF_score + 0.3 * content_score + 0.1 * popularity
Switching: Use content-based for new users, CF once enough data exists
Cascade: CF generates candidates, content-based re-ranks them
Feature augmentation: CF embeddings become features for a content model

Matrix Factorization#

The breakthrough behind Netflix Prize. Decompose the sparse user-item matrix into dense latent factors.

# Matrix Factorization with Surprise library
from surprise import SVD, Dataset, Reader
import pandas as pd

# Load ratings
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(
    df[['user_id', 'item_id', 'rating']], reader
)

# SVD — learns latent factors for users and items
algo = SVD(n_factors=100, n_epochs=20, lr_all=0.005)
trainset = data.build_full_trainset()
algo.fit(trainset)

# Predict: user 42's rating for item 314
prediction = algo.predict(uid=42, iid=314)
# prediction.est → 4.2 (predicted rating)

Latent factors capture hidden dimensions — a movie might score high on "cerebral sci-fi" and low on "family friendly" without anyone labeling those dimensions.

Deep Learning Recommendations#

Neural networks handle complex patterns, sequences, and multimodal data.

# Two-tower model with TensorFlow Recommenders
import tensorflow as tf
import tensorflow_recommenders as tfrs

# User tower — learns user embeddings
user_model = tf.keras.Sequential([
    tf.keras.layers.StringLookup(vocabulary=user_ids),
    tf.keras.layers.Embedding(len(user_ids) + 1, 64),
])

# Item tower — learns item embeddings
item_model = tf.keras.Sequential([
    tf.keras.layers.StringLookup(vocabulary=item_ids),
    tf.keras.layers.Embedding(len(item_ids) + 1, 64),
])

# Retrieval task — find items closest to user in embedding space
task = tfrs.tasks.Retrieval(
    metrics=tfrs.metrics.FactorizedTopK(
        candidates=items.batch(128).map(item_model)
    )
)

When to use deep learning: Large datasets (millions of interactions), sequential patterns (session-based), multimodal features (text + image + behavior).

The Cold Start Problem#

The hardest challenge: recommending for new users or new items with no interaction history.

Strategy	New Users	New Items
Popularity baseline	Show trending items	N/A
Content features	Ask preferences on signup	Use item metadata
Demographic matching	Match similar demographics	N/A
Exploration bonus	Boost diverse items	Boost new items
Bandit algorithms	Explore vs exploit balance	Explore vs exploit

# Multi-armed bandit for cold start (Thompson Sampling)
import numpy as np

class ThompsonSampling:
    def __init__(self, n_items):
        self.alpha = np.ones(n_items)  # successes
        self.beta = np.ones(n_items)   # failures

    def select_item(self):
        samples = np.random.beta(self.alpha, self.beta)
        return np.argmax(samples)

    def update(self, item_idx, reward):
        if reward:
            self.alpha[item_idx] += 1
        else:
            self.beta[item_idx] += 1

A/B Testing Recommendations#

You can't improve what you can't measure. Every recommendation change needs rigorous testing.

Control group (50%):  Current algorithm
Treatment group (50%): New algorithm

Metrics to track:
├── Engagement: CTR, time-on-site, pages-per-session
├── Conversion: purchases, signups, completions
├── Diversity: unique items shown, category spread
├── Novelty: how "surprising" are recommendations
└── Coverage: % of catalog that gets recommended

Pitfall: Optimizing only for CTR creates clickbait. Track downstream metrics (purchases, retention) alongside clicks.

Real-Time vs Batch Processing#

Aspect	Batch	Real-Time	Hybrid
Latency	Hours	Milliseconds	Minutes
Freshness	Stale	Immediate	Near-real-time
Cost	Low	High	Medium
Complexity	Simple	Complex	Moderate

Most production systems use hybrid: batch-compute candidate sets, real-time re-rank based on session context.

Batch pipeline (nightly):
  User history → Train model → Generate top-1000 candidates per user → Store in Redis

Real-time layer (per request):
  Session context → Re-rank candidates → Apply business rules → Return top-20

Tools and Frameworks#

Tool	Best For	Scale
Surprise	Prototyping, research	Small-medium
TensorFlow Recommenders	Deep learning recs	Large
Apache Mahout	Hadoop-based CF	Very large
LensKit	Academic research	Small
Merlin (NVIDIA)	GPU-accelerated training	Enterprise
Feast	Feature store for ML	Any
Milvus / Pinecone	Vector similarity search	Large

Production Architecture#

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Event Stream │────→│  Feature     │────→│  Model       │
│  (Kafka)      │     │  Store       │     │  Training    │
└──────────────┘     │  (Feast)     │     │  (nightly)   │
                      └──────────────┘     └──────┬───────┘
                                                   │
┌──────────────┐     ┌──────────────┐     ┌───────▼───────┐
│  API Gateway  │←───│  Re-Ranker   │←───│  Candidate    │
│  (response)   │     │  (real-time) │     │  Store (Redis)│
└──────────────┘     └──────────────┘     └───────────────┘

Key Takeaways#

Start simple — popularity and item-based CF beat complex models with small data
Hybrid always wins — combine multiple signals for robustness
Solve cold start explicitly — bandits and content features fill the gap
Measure everything — A/B test with downstream metrics, not just CTR
Batch + real-time — precompute candidates, re-rank in real time

Build smarter systems with codelit.io — your visual architecture companion.

Article 191 of the Codelit engineering blog series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Real-Time Collaborative Editor

Notion-like document editor with real-time collaboration, conflict resolution, and rich media.

9 components

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Build this architecture

Generate an interactive Recommendation Engine Architecture in seconds.

Try it in Codelit →

recommendation enginemachine learningarchitecturesystem designdata engineering

Recommendation Engine Architecture: From Collaborative Filtering to Deep Learning

March 28, 2026 7 min readBy Codelit Team Discussion

Recommendation Engine Architecture#

Recommendations drive 35% of Amazon's revenue and 80% of Netflix watches. Behind every "you might also like" is a system balancing relevance, diversity, freshness, and latency.

Why Recommendations Matter#

Without recommendations:
  User searches → browses → maybe finds something → high bounce rate

With recommendations:
  User arrives → personalized feed → discovers items → longer sessions
  Result: 10-30% increase in engagement and conversion

The difference between a mediocre and great recommendation engine is architecture, not just algorithms.

Collaborative Filtering#

The most intuitive approach: people who liked similar things will like similar things.

User-Based Collaborative Filtering#

Find users similar to the target user, then recommend what those similar users liked.

# Simplified user-based CF
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# User-item interaction matrix (ratings)
# Rows = users, Columns = items
ratings = np.array([
    [5, 3, 0, 1],  # User A
    [4, 0, 0, 1],  # User B
    [1, 1, 0, 5],  # User C
    [0, 0, 5, 4],  # User D
])

# Compute user similarity
user_sim = cosine_similarity(ratings)

# For User B, find most similar user → User A
# Recommend items User A rated highly that User B hasn't seen
# → Recommend Item 2 (rating 3 from User A)

Pros: No item metadata needed, captures complex patterns. Cons: Cold start for new users, doesn't scale well with millions of users.

Item-Based Collaborative Filtering#

Find items similar to what the user already liked. Amazon popularized this approach.

# Item-based CF — compute item similarity
item_sim = cosine_similarity(ratings.T)

# User liked Item 1 → find items most similar to Item 1
# Similarity scores tell us which items co-occur in ratings
# More stable than user-based (items change less than user behavior)

Why item-based often wins: Item relationships are more stable than user relationships. A user's taste shifts; the similarity between two movies doesn't.

Content-Based Filtering#

Recommend items with features similar to what the user previously liked.

# Content-based: use item features
item_features = {
    "movie_1": {"genre": "sci-fi", "director": "Nolan", "year": 2014},
    "movie_2": {"genre": "sci-fi", "director": "Villeneuve", "year": 2021},
    "movie_3": {"genre": "comedy", "director": "Gerwig", "year": 2023},
}

# User watched movie_1 and liked it
# TF-IDF or embedding on features → movie_2 is most similar
# Recommend movie_2

Pros: No cold start for items (features available immediately), transparent reasoning. Cons: Over-specialization (filter bubble), can't discover surprising recommendations.

Hybrid Approaches#

Production systems combine multiple strategies:

┌─────────────────────────────────────────────┐
│            Hybrid Recommender               │
├─────────────────────────────────────────────┤
│                                             │
│  Collaborative ──┐                          │
│  Filtering       ├──→ Weighted    ──→ Final │
│                  │    Combination     List   │
│  Content-Based ──┤                          │
│  Filtering       │                          │
│                  │                          │
│  Popularity    ──┘                          │
│  Baseline                                   │
│                                             │
└─────────────────────────────────────────────┘

Common hybrid strategies:

Weighted: Score = 0.6 * CF_score + 0.3 * content_score + 0.1 * popularity
Switching: Use content-based for new users, CF once enough data exists
Cascade: CF generates candidates, content-based re-ranks them
Feature augmentation: CF embeddings become features for a content model

Matrix Factorization#

The breakthrough behind Netflix Prize. Decompose the sparse user-item matrix into dense latent factors.

# Matrix Factorization with Surprise library
from surprise import SVD, Dataset, Reader
import pandas as pd

# Load ratings
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(
    df[['user_id', 'item_id', 'rating']], reader
)

# SVD — learns latent factors for users and items
algo = SVD(n_factors=100, n_epochs=20, lr_all=0.005)
trainset = data.build_full_trainset()
algo.fit(trainset)

# Predict: user 42's rating for item 314
prediction = algo.predict(uid=42, iid=314)
# prediction.est → 4.2 (predicted rating)

Latent factors capture hidden dimensions — a movie might score high on "cerebral sci-fi" and low on "family friendly" without anyone labeling those dimensions.

Deep Learning Recommendations#

Neural networks handle complex patterns, sequences, and multimodal data.

# Two-tower model with TensorFlow Recommenders
import tensorflow as tf
import tensorflow_recommenders as tfrs

# User tower — learns user embeddings
user_model = tf.keras.Sequential([
    tf.keras.layers.StringLookup(vocabulary=user_ids),
    tf.keras.layers.Embedding(len(user_ids) + 1, 64),
])

# Item tower — learns item embeddings
item_model = tf.keras.Sequential([
    tf.keras.layers.StringLookup(vocabulary=item_ids),
    tf.keras.layers.Embedding(len(item_ids) + 1, 64),
])

# Retrieval task — find items closest to user in embedding space
task = tfrs.tasks.Retrieval(
    metrics=tfrs.metrics.FactorizedTopK(
        candidates=items.batch(128).map(item_model)
    )
)

When to use deep learning: Large datasets (millions of interactions), sequential patterns (session-based), multimodal features (text + image + behavior).

The Cold Start Problem#

The hardest challenge: recommending for new users or new items with no interaction history.

Strategy	New Users	New Items
Popularity baseline	Show trending items	N/A
Content features	Ask preferences on signup	Use item metadata
Demographic matching	Match similar demographics	N/A
Exploration bonus	Boost diverse items	Boost new items
Bandit algorithms	Explore vs exploit balance	Explore vs exploit

# Multi-armed bandit for cold start (Thompson Sampling)
import numpy as np

class ThompsonSampling:
    def __init__(self, n_items):
        self.alpha = np.ones(n_items)  # successes
        self.beta = np.ones(n_items)   # failures

    def select_item(self):
        samples = np.random.beta(self.alpha, self.beta)
        return np.argmax(samples)

    def update(self, item_idx, reward):
        if reward:
            self.alpha[item_idx] += 1
        else:
            self.beta[item_idx] += 1

A/B Testing Recommendations#

You can't improve what you can't measure. Every recommendation change needs rigorous testing.

Control group (50%):  Current algorithm
Treatment group (50%): New algorithm

Metrics to track:
├── Engagement: CTR, time-on-site, pages-per-session
├── Conversion: purchases, signups, completions
├── Diversity: unique items shown, category spread
├── Novelty: how "surprising" are recommendations
└── Coverage: % of catalog that gets recommended

Pitfall: Optimizing only for CTR creates clickbait. Track downstream metrics (purchases, retention) alongside clicks.

Real-Time vs Batch Processing#

Aspect	Batch	Real-Time	Hybrid
Latency	Hours	Milliseconds	Minutes
Freshness	Stale	Immediate	Near-real-time
Cost	Low	High	Medium
Complexity	Simple	Complex	Moderate

Most production systems use hybrid: batch-compute candidate sets, real-time re-rank based on session context.

Batch pipeline (nightly):
  User history → Train model → Generate top-1000 candidates per user → Store in Redis

Real-time layer (per request):
  Session context → Re-rank candidates → Apply business rules → Return top-20

Tools and Frameworks#

Tool	Best For	Scale
Surprise	Prototyping, research	Small-medium
TensorFlow Recommenders	Deep learning recs	Large
Apache Mahout	Hadoop-based CF	Very large
LensKit	Academic research	Small
Merlin (NVIDIA)	GPU-accelerated training	Enterprise
Feast	Feature store for ML	Any
Milvus / Pinecone	Vector similarity search	Large

Production Architecture#

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Event Stream │────→│  Feature     │────→│  Model       │
│  (Kafka)      │     │  Store       │     │  Training    │
└──────────────┘     │  (Feast)     │     │  (nightly)   │
                      └──────────────┘     └──────┬───────┘
                                                   │
┌──────────────┐     ┌──────────────┐     ┌───────▼───────┐
│  API Gateway  │←───│  Re-Ranker   │←───│  Candidate    │
│  (response)   │     │  (real-time) │     │  Store (Redis)│
└──────────────┘     └──────────────┘     └───────────────┘

Key Takeaways#

Start simple — popularity and item-based CF beat complex models with small data
Hybrid always wins — combine multiple signals for robustness
Solve cold start explicitly — bandits and content features fill the gap
Measure everything — A/B test with downstream metrics, not just CTR
Batch + real-time — precompute candidates, re-rank in real time

Build smarter systems with codelit.io — your visual architecture companion.

Article 191 of the Codelit engineering blog series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive Recommendation Engine Architecture in seconds.

Try it in Codelit →

Recommendation Engine Architecture: From Collaborative Filtering to Deep Learning

Recommendation Engine Architecture#

Why Recommendations Matter#

Collaborative Filtering#

User-Based Collaborative Filtering#

Item-Based Collaborative Filtering#

Content-Based Filtering#

Hybrid Approaches#

Matrix Factorization#

Deep Learning Recommendations#

The Cold Start Problem#

A/B Testing Recommendations#

Real-Time vs Batch Processing#

Tools and Frameworks#

Production Architecture#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Real-Time Collaborative Editor

Netflix Video Streaming Architecture

Search Engine Architecture

Build this architecture

Recommendation Engine Architecture: From Collaborative Filtering to Deep Learning

Recommendation Engine Architecture#

Why Recommendations Matter#

Collaborative Filtering#

User-Based Collaborative Filtering#

Item-Based Collaborative Filtering#

Content-Based Filtering#

Hybrid Approaches#

Matrix Factorization#

Deep Learning Recommendations#

The Cold Start Problem#

A/B Testing Recommendations#

Real-Time vs Batch Processing#

Tools and Frameworks#

Production Architecture#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Real-Time Collaborative Editor

Netflix Video Streaming Architecture

Search Engine Architecture

Build this architecture