system-designinterviewinfrastructure

Design a Search System — From Inverted Indexes to Ranked Results

March 24, 2026 4 min readBy Mo Discussion

Search is deceptively complex#

Type a query, get results. Simple for users, brutally complex to build. A search system combines information retrieval, natural language processing, distributed systems, and machine learning.

The search pipeline#

1. Data ingestion#

Content enters the search index through:

Web crawlers — Discover and download pages (Googlebot)
Database CDC — Change data capture streams updates from your DB
API feeds — Partners push content via API
User uploads — Documents, images with extracted text (OCR)

2. Processing and indexing#

Before content is searchable, process it:

Tokenization: Split text into tokens. "New York City" → ["new", "york", "city"]

Normalization: Lowercase, remove accents, stem words. "Running" → "run"

Stop word removal: Remove common words. "the", "a", "is" — unless they matter (search for "The Who")

Build inverted index:

"kubernetes" → [doc_42, doc_187, doc_2901]
"scaling"    → [doc_42, doc_88, doc_5002]
"database"   → [doc_88, doc_187, doc_5002]

3. Query processing#

When a user searches:

Parse query — Tokenize, normalize, expand synonyms
Spell correction — "kuberntes" → "kubernetes"
Query expansion — "k8s" → also search "kubernetes"
Intent detection — "pizza near me" → local search, not web search

4. Retrieval#

Find candidate documents matching the query:

Boolean retrieval: AND/OR operations on inverted index. Fast, but no ranking.

Vector retrieval: Encode query and documents as embeddings. Find nearest neighbors. Better for semantic search ("best laptop for coding" matches reviews about programming laptops).

5. Ranking#

Score and sort candidates:

BM25 (term-based):

Term frequency — how often the term appears in the document
Inverse document frequency — how rare the term is across all documents
Field length — shorter documents with the term score higher

Semantic ranking (ML-based):

BERT/transformer models understand query intent
"apple fruit nutrition" vs "apple macbook price" — same word, different intent
Cross-encoder models compare query-document pairs for fine-grained relevance

Learning to Rank (LTR): Combine BM25, semantic, and click-through signals into a single ranking model trained on user behavior.

6. Result presentation#

Snippets — Highlight matching terms in context
Facets — Filter by category, price, date, rating
Autocomplete — Suggest queries as user types
Did you mean — Spell correction suggestions
Knowledge panel — Direct answers for factual queries

Architecture#

Query → Query Processor → Retrieval (inverted index)
                        → Ranking (BM25 + ML)
                        → Result Assembly
                        → Response with snippets + facets

Indexing path:

Content → Processor → Tokenizer → Index Builder → Distributed Index

Scaling search#

Challenge	Solution
Large index	Shard across nodes (by document ID or term range)
High query volume	Replicate shards, load balance queries
Index freshness	Near-real-time indexing (1-second refresh)
Global latency	Replicate index to multiple regions
Relevance tuning	A/B test ranking changes, measure click-through

Choosing a search engine#

Engine	Best for
Elasticsearch	General-purpose, ELK stack, JSON documents
Typesense	Simple setup, typo tolerance, instant search
Meilisearch	Developer-friendly, fast, easy to deploy
Algolia	Hosted, instant search, great DX (expensive)
PostgreSQL FTS	Already using Postgres, moderate search needs

Visualize your search architecture#

See how indexing, query processing, and ranking connect — try Codelit to generate an interactive diagram of your search system.

Key takeaways#

Inverted indexes are the foundation — O(1) per term lookup
BM25 + semantic ranking gives the best results
Query processing matters — spell correction, synonyms, intent detection
Shard by document for horizontal scaling
Near-real-time indexing keeps search fresh (1-second delay)
Start with Typesense or PostgreSQL FTS — add Elasticsearch when you need scale

{ }

Explore the Spotify architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

90+ Templates

Practice with real-world architectures — Uber, Netflix, Slack, and more

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

E-Commerce Checkout System

Production checkout flow with Stripe payments, inventory management, and fraud detection.

11 components

Notification System

Multi-channel notification platform with preferences, templating, and delivery tracking.

9 components

Build this architecture

Generate an interactive architecture for Design a Search System in seconds.

Try it in Codelit →

system-designinterviewinfrastructure

Design a Search System — From Inverted Indexes to Ranked Results

March 24, 2026 4 min readBy Mo Discussion

Search is deceptively complex#

Type a query, get results. Simple for users, brutally complex to build. A search system combines information retrieval, natural language processing, distributed systems, and machine learning.

The search pipeline#

1. Data ingestion#

Content enters the search index through:

Web crawlers — Discover and download pages (Googlebot)
Database CDC — Change data capture streams updates from your DB
API feeds — Partners push content via API
User uploads — Documents, images with extracted text (OCR)

2. Processing and indexing#

Before content is searchable, process it:

Tokenization: Split text into tokens. "New York City" → ["new", "york", "city"]

Normalization: Lowercase, remove accents, stem words. "Running" → "run"

Stop word removal: Remove common words. "the", "a", "is" — unless they matter (search for "The Who")

Build inverted index:

"kubernetes" → [doc_42, doc_187, doc_2901]
"scaling"    → [doc_42, doc_88, doc_5002]
"database"   → [doc_88, doc_187, doc_5002]

3. Query processing#

When a user searches:

Parse query — Tokenize, normalize, expand synonyms
Spell correction — "kuberntes" → "kubernetes"
Query expansion — "k8s" → also search "kubernetes"
Intent detection — "pizza near me" → local search, not web search

4. Retrieval#

Find candidate documents matching the query:

Boolean retrieval: AND/OR operations on inverted index. Fast, but no ranking.

Vector retrieval: Encode query and documents as embeddings. Find nearest neighbors. Better for semantic search ("best laptop for coding" matches reviews about programming laptops).

5. Ranking#

Score and sort candidates:

BM25 (term-based):

Term frequency — how often the term appears in the document
Inverse document frequency — how rare the term is across all documents
Field length — shorter documents with the term score higher

Semantic ranking (ML-based):

BERT/transformer models understand query intent
"apple fruit nutrition" vs "apple macbook price" — same word, different intent
Cross-encoder models compare query-document pairs for fine-grained relevance

Learning to Rank (LTR): Combine BM25, semantic, and click-through signals into a single ranking model trained on user behavior.

6. Result presentation#

Snippets — Highlight matching terms in context
Facets — Filter by category, price, date, rating
Autocomplete — Suggest queries as user types
Did you mean — Spell correction suggestions
Knowledge panel — Direct answers for factual queries

Architecture#

Query → Query Processor → Retrieval (inverted index)
                        → Ranking (BM25 + ML)
                        → Result Assembly
                        → Response with snippets + facets

Indexing path:

Content → Processor → Tokenizer → Index Builder → Distributed Index

Scaling search#

Challenge	Solution
Large index	Shard across nodes (by document ID or term range)
High query volume	Replicate shards, load balance queries
Index freshness	Near-real-time indexing (1-second refresh)
Global latency	Replicate index to multiple regions
Relevance tuning	A/B test ranking changes, measure click-through

Choosing a search engine#

Engine	Best for
Elasticsearch	General-purpose, ELK stack, JSON documents
Typesense	Simple setup, typo tolerance, instant search
Meilisearch	Developer-friendly, fast, easy to deploy
Algolia	Hosted, instant search, great DX (expensive)
PostgreSQL FTS	Already using Postgres, moderate search needs

Visualize your search architecture#

See how indexing, query processing, and ranking connect — try Codelit to generate an interactive diagram of your search system.

Key takeaways#

Inverted indexes are the foundation — O(1) per term lookup
BM25 + semantic ranking gives the best results
Query processing matters — spell correction, synonyms, intent detection
Shard by document for horizontal scaling
Near-real-time indexing keeps search fresh (1-second delay)
Start with Typesense or PostgreSQL FTS — add Elasticsearch when you need scale

{ }

Explore the Spotify architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

90+ Templates

Practice with real-world architectures — Uber, Netflix, Slack, and more

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive architecture for Design a Search System in seconds.

Try it in Codelit →

Design a Search System — From Inverted Indexes to Ranked Results

Search is deceptively complex#

The search pipeline#

1. Data ingestion#

2. Processing and indexing#

3. Query processing#

4. Retrieval#

5. Ranking#

6. Result presentation#

Architecture#

Scaling search#

Choosing a search engine#

Visualize your search architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Uber Real-Time Location System

E-Commerce Checkout System

Notification System

Build this architecture

Design a Search System — From Inverted Indexes to Ranked Results

Search is deceptively complex#

The search pipeline#

1. Data ingestion#

2. Processing and indexing#

3. Query processing#

4. Retrieval#

5. Ranking#

6. Result presentation#

Architecture#

Scaling search#

Choosing a search engine#

Visualize your search architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Uber Real-Time Location System

E-Commerce Checkout System

Notification System

Build this architecture