Design a Search System — From Inverted Indexes to Ranked Results
Search is deceptively complex#
Type a query, get results. Simple for users, brutally complex to build. A search system combines information retrieval, natural language processing, distributed systems, and machine learning.
The search pipeline#
1. Data ingestion#
Content enters the search index through:
- Web crawlers — Discover and download pages (Googlebot)
- Database CDC — Change data capture streams updates from your DB
- API feeds — Partners push content via API
- User uploads — Documents, images with extracted text (OCR)
2. Processing and indexing#
Before content is searchable, process it:
Tokenization: Split text into tokens. "New York City" → ["new", "york", "city"]
Normalization: Lowercase, remove accents, stem words. "Running" → "run"
Stop word removal: Remove common words. "the", "a", "is" — unless they matter (search for "The Who")
Build inverted index:
"kubernetes" → [doc_42, doc_187, doc_2901]
"scaling" → [doc_42, doc_88, doc_5002]
"database" → [doc_88, doc_187, doc_5002]
3. Query processing#
When a user searches:
- Parse query — Tokenize, normalize, expand synonyms
- Spell correction — "kuberntes" → "kubernetes"
- Query expansion — "k8s" → also search "kubernetes"
- Intent detection — "pizza near me" → local search, not web search
4. Retrieval#
Find candidate documents matching the query:
Boolean retrieval: AND/OR operations on inverted index. Fast, but no ranking.
Vector retrieval: Encode query and documents as embeddings. Find nearest neighbors. Better for semantic search ("best laptop for coding" matches reviews about programming laptops).
5. Ranking#
Score and sort candidates:
BM25 (term-based):
- Term frequency — how often the term appears in the document
- Inverse document frequency — how rare the term is across all documents
- Field length — shorter documents with the term score higher
Semantic ranking (ML-based):
- BERT/transformer models understand query intent
- "apple fruit nutrition" vs "apple macbook price" — same word, different intent
- Cross-encoder models compare query-document pairs for fine-grained relevance
Learning to Rank (LTR): Combine BM25, semantic, and click-through signals into a single ranking model trained on user behavior.
6. Result presentation#
- Snippets — Highlight matching terms in context
- Facets — Filter by category, price, date, rating
- Autocomplete — Suggest queries as user types
- Did you mean — Spell correction suggestions
- Knowledge panel — Direct answers for factual queries
Architecture#
Query → Query Processor → Retrieval (inverted index)
→ Ranking (BM25 + ML)
→ Result Assembly
→ Response with snippets + facets
Indexing path:
Content → Processor → Tokenizer → Index Builder → Distributed Index
Scaling search#
| Challenge | Solution |
|---|---|
| Large index | Shard across nodes (by document ID or term range) |
| High query volume | Replicate shards, load balance queries |
| Index freshness | Near-real-time indexing (1-second refresh) |
| Global latency | Replicate index to multiple regions |
| Relevance tuning | A/B test ranking changes, measure click-through |
Choosing a search engine#
| Engine | Best for |
|---|---|
| Elasticsearch | General-purpose, ELK stack, JSON documents |
| Typesense | Simple setup, typo tolerance, instant search |
| Meilisearch | Developer-friendly, fast, easy to deploy |
| Algolia | Hosted, instant search, great DX (expensive) |
| PostgreSQL FTS | Already using Postgres, moderate search needs |
Visualize your search architecture#
See how indexing, query processing, and ranking connect — try Codelit to generate an interactive diagram of your search system.
Key takeaways#
- Inverted indexes are the foundation — O(1) per term lookup
- BM25 + semantic ranking gives the best results
- Query processing matters — spell correction, synonyms, intent detection
- Shard by document for horizontal scaling
- Near-real-time indexing keeps search fresh (1-second delay)
- Start with Typesense or PostgreSQL FTS — add Elasticsearch when you need scale
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
90+ Templates
Practice with real-world architectures — Uber, Netflix, Slack, and more
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsE-Commerce Checkout System
Production checkout flow with Stripe payments, inventory management, and fraud detection.
11 componentsNotification System
Multi-channel notification platform with preferences, templating, and delivery tracking.
9 componentsBuild this architecture
Generate an interactive architecture for Design a Search System in seconds.
Try it in Codelit →
Comments