Design an Autocomplete System — Trie, Search Ranking, and Real-Time Suggestions
Why autocomplete is everywhere#
Google, Amazon, Spotify, Slack — every search box has autocomplete. It reduces typing, guides users to popular queries, and increases conversion.
Designing autocomplete tests your knowledge of data structures, caching, and real-time systems.
The core data structure: Trie#
A trie (prefix tree) stores strings character by character. Each path from root to a node represents a prefix.
Root
├── h
│ ├── e
│ │ ├── l
│ │ │ ├── l
│ │ │ │ └── o (freq: 500)
│ │ └── r
│ │ └── o (freq: 200)
│ └── o
│ ├── w (freq: 350)
│ └── t (freq: 100)
Lookup: Type "he" → traverse to the e node → return top-k suggestions from all descendants: "hello" (500), "hero" (200).
Time complexity: O(p + k) where p = prefix length, k = number of suggestions.
Ranking suggestions#
Not all completions are equal. Rank by:
- Query frequency — how often this query was searched
- Recency — boost recently trending queries
- Personalization — user's own search history
- Context — what page they're on, their location
Weighted score:
score = frequency × recency_boost × personal_boost
Store the top-k suggestions pre-computed at each trie node. When "he" is typed, immediately return the cached top 5 — no traversal needed.
Architecture#
Write path (offline)#
- Query logs — collect every search query with timestamp
- Aggregation — MapReduce job counts query frequencies (hourly/daily)
- Trie builder — constructs a new trie from aggregated frequencies
- Deploy — swap the old trie with the new one (blue-green)
Read path (online)#
- User types "he" → request to autocomplete service
- Service looks up "he" in the trie → returns cached top-5 suggestions
- Response in under 50ms
Real-time updates#
For trending topics (breaking news), you can't wait for hourly rebuilds:
- Streaming aggregation — Kafka/Flink counts queries in real-time
- Hot cache update — inject trending queries into a separate cache layer
- Merge at read time — combine trie suggestions with trending cache
Optimization techniques#
Prefix caching#
Cache the top suggestions for the most common prefixes in Redis:
"ho" → ["hotels near me", "home depot", "how to", "honda", "hot topic"]
"hote" → ["hotels near me", "hotel booking", "hotel california"]
Most autocomplete queries hit the cache. The trie is the fallback.
Debouncing#
Don't send a request on every keystroke. Wait 100-200ms after the user stops typing. This reduces server load by 60-80%.
Browser caching#
Cache recent suggestions in the browser. If the user types "ho", deletes, then types "ho" again — serve from local cache.
Sampling#
For frequency counting, you don't need every query. Sample 1 in 10 or 1 in 100 — the popular queries still surface.
Handling special cases#
Offensive content: Maintain a blocklist. Filter suggestions before returning.
Spell correction: Suggest "did you mean..." for misspelled prefixes. Use edit distance (Levenshtein) to find close matches.
Multi-language: Separate tries per language. Detect language from user locale or recent queries.
Zero-prefix: When the search box is empty, show trending or personalized suggestions.
Scaling to billions of queries#
| Challenge | Solution |
|---|---|
| Trie too large for one server | Shard by first 1-2 characters ("a-m" on shard 1, "n-z" on shard 2) |
| High read traffic | Replicas per shard, CDN-cached popular prefixes |
| Real-time trending | Streaming pipeline alongside batch trie rebuilds |
| Global latency | Edge-deployed trie replicas per region |
Data flow summary#
User types → Debounce (200ms) → Check browser cache →
Cache hit → Show suggestions
Cache miss → API request → Check Redis prefix cache →
Cache hit → Return cached suggestions
Cache miss → Query trie → Return + cache result
Visualize your search architecture#
See how autocomplete fits into a full search system — try Codelit to generate an interactive diagram showing how tries, caches, and search services connect.
Key takeaways#
- Trie is the core data structure — O(prefix length) lookup
- Pre-compute top-k at each node — no traversal at query time
- Debounce 200ms — reduces server load dramatically
- Cache aggressively — browser, CDN, Redis prefix cache
- Offline trie build + online trending cache — balance freshness and cost
- Shard by prefix for horizontal scaling
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
90+ Templates
Practice with real-world architectures — Uber, Netflix, Slack, and more
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsReal-Time Collaborative Editor
Notion-like document editor with real-time collaboration, conflict resolution, and rich media.
9 componentsE-Commerce Checkout System
Production checkout flow with Stripe payments, inventory management, and fraud detection.
11 componentsBuild this architecture
Generate an interactive architecture for Design an Autocomplete System in seconds.
Try it in Codelit →
Comments