system-designinterviewalgorithms

Design an Autocomplete System — Trie, Search Ranking, and Real-Time Suggestions

March 24, 2026 4 min readBy Mo Discussion

Why autocomplete is everywhere#

Google, Amazon, Spotify, Slack — every search box has autocomplete. It reduces typing, guides users to popular queries, and increases conversion.

Designing autocomplete tests your knowledge of data structures, caching, and real-time systems.

The core data structure: Trie#

A trie (prefix tree) stores strings character by character. Each path from root to a node represents a prefix.

Root
├── h
│   ├── e
│   │   ├── l
│   │   │   ├── l
│   │   │   │   └── o (freq: 500)
│   │   └── r
│   │       └── o (freq: 200)
│   └── o
│       ├── w (freq: 350)
│       └── t (freq: 100)

Lookup: Type "he" → traverse to the e node → return top-k suggestions from all descendants: "hello" (500), "hero" (200).

Time complexity: O(p + k) where p = prefix length, k = number of suggestions.

Ranking suggestions#

Not all completions are equal. Rank by:

Query frequency — how often this query was searched
Recency — boost recently trending queries
Personalization — user's own search history
Context — what page they're on, their location

Weighted score:

score = frequency × recency_boost × personal_boost

Store the top-k suggestions pre-computed at each trie node. When "he" is typed, immediately return the cached top 5 — no traversal needed.

Architecture#

Write path (offline)#

Query logs — collect every search query with timestamp
Aggregation — MapReduce job counts query frequencies (hourly/daily)
Trie builder — constructs a new trie from aggregated frequencies
Deploy — swap the old trie with the new one (blue-green)

Read path (online)#

User types "he" → request to autocomplete service
Service looks up "he" in the trie → returns cached top-5 suggestions
Response in under 50ms

Real-time updates#

For trending topics (breaking news), you can't wait for hourly rebuilds:

Streaming aggregation — Kafka/Flink counts queries in real-time
Hot cache update — inject trending queries into a separate cache layer
Merge at read time — combine trie suggestions with trending cache

Optimization techniques#

Prefix caching#

Cache the top suggestions for the most common prefixes in Redis:

"ho" → ["hotels near me", "home depot", "how to", "honda", "hot topic"]
"hote" → ["hotels near me", "hotel booking", "hotel california"]

Most autocomplete queries hit the cache. The trie is the fallback.

Debouncing#

Don't send a request on every keystroke. Wait 100-200ms after the user stops typing. This reduces server load by 60-80%.

Browser caching#

Cache recent suggestions in the browser. If the user types "ho", deletes, then types "ho" again — serve from local cache.

Sampling#

For frequency counting, you don't need every query. Sample 1 in 10 or 1 in 100 — the popular queries still surface.

Handling special cases#

Offensive content: Maintain a blocklist. Filter suggestions before returning.

Spell correction: Suggest "did you mean..." for misspelled prefixes. Use edit distance (Levenshtein) to find close matches.

Multi-language: Separate tries per language. Detect language from user locale or recent queries.

Zero-prefix: When the search box is empty, show trending or personalized suggestions.

Scaling to billions of queries#

Challenge	Solution
Trie too large for one server	Shard by first 1-2 characters ("a-m" on shard 1, "n-z" on shard 2)
High read traffic	Replicas per shard, CDN-cached popular prefixes
Real-time trending	Streaming pipeline alongside batch trie rebuilds
Global latency	Edge-deployed trie replicas per region

Data flow summary#

User types → Debounce (200ms) → Check browser cache →
  Cache hit → Show suggestions
  Cache miss → API request → Check Redis prefix cache →
    Cache hit → Return cached suggestions
    Cache miss → Query trie → Return + cache result

Visualize your search architecture#

See how autocomplete fits into a full search system — try Codelit to generate an interactive diagram showing how tries, caches, and search services connect.

Key takeaways#

Trie is the core data structure — O(prefix length) lookup
Pre-compute top-k at each node — no traversal at query time
Debounce 200ms — reduces server load dramatically
Cache aggressively — browser, CDN, Redis prefix cache
Offline trie build + online trending cache — balance freshness and cost
Shard by prefix for horizontal scaling

{ }

Explore the Spotify architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

90+ Templates

Practice with real-world architectures — Uber, Netflix, Slack, and more

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

Real-Time Collaborative Editor

Notion-like document editor with real-time collaboration, conflict resolution, and rich media.

9 components

E-Commerce Checkout System

Production checkout flow with Stripe payments, inventory management, and fraud detection.

11 components

Build this architecture

Generate an interactive architecture for Design an Autocomplete System in seconds.

Try it in Codelit →

system-designinterviewalgorithms

Design an Autocomplete System — Trie, Search Ranking, and Real-Time Suggestions

March 24, 2026 4 min readBy Mo Discussion

Why autocomplete is everywhere#

Google, Amazon, Spotify, Slack — every search box has autocomplete. It reduces typing, guides users to popular queries, and increases conversion.

Designing autocomplete tests your knowledge of data structures, caching, and real-time systems.

The core data structure: Trie#

A trie (prefix tree) stores strings character by character. Each path from root to a node represents a prefix.

Root
├── h
│   ├── e
│   │   ├── l
│   │   │   ├── l
│   │   │   │   └── o (freq: 500)
│   │   └── r
│   │       └── o (freq: 200)
│   └── o
│       ├── w (freq: 350)
│       └── t (freq: 100)

Lookup: Type "he" → traverse to the e node → return top-k suggestions from all descendants: "hello" (500), "hero" (200).

Time complexity: O(p + k) where p = prefix length, k = number of suggestions.

Ranking suggestions#

Not all completions are equal. Rank by:

Query frequency — how often this query was searched
Recency — boost recently trending queries
Personalization — user's own search history
Context — what page they're on, their location

Weighted score:

score = frequency × recency_boost × personal_boost

Store the top-k suggestions pre-computed at each trie node. When "he" is typed, immediately return the cached top 5 — no traversal needed.

Architecture#

Write path (offline)#

Query logs — collect every search query with timestamp
Aggregation — MapReduce job counts query frequencies (hourly/daily)
Trie builder — constructs a new trie from aggregated frequencies
Deploy — swap the old trie with the new one (blue-green)

Read path (online)#

User types "he" → request to autocomplete service
Service looks up "he" in the trie → returns cached top-5 suggestions
Response in under 50ms

Real-time updates#

For trending topics (breaking news), you can't wait for hourly rebuilds:

Streaming aggregation — Kafka/Flink counts queries in real-time
Hot cache update — inject trending queries into a separate cache layer
Merge at read time — combine trie suggestions with trending cache

Optimization techniques#

Prefix caching#

Cache the top suggestions for the most common prefixes in Redis:

"ho" → ["hotels near me", "home depot", "how to", "honda", "hot topic"]
"hote" → ["hotels near me", "hotel booking", "hotel california"]

Most autocomplete queries hit the cache. The trie is the fallback.

Debouncing#

Don't send a request on every keystroke. Wait 100-200ms after the user stops typing. This reduces server load by 60-80%.

Browser caching#

Cache recent suggestions in the browser. If the user types "ho", deletes, then types "ho" again — serve from local cache.

Sampling#

For frequency counting, you don't need every query. Sample 1 in 10 or 1 in 100 — the popular queries still surface.

Handling special cases#

Offensive content: Maintain a blocklist. Filter suggestions before returning.

Spell correction: Suggest "did you mean..." for misspelled prefixes. Use edit distance (Levenshtein) to find close matches.

Multi-language: Separate tries per language. Detect language from user locale or recent queries.

Zero-prefix: When the search box is empty, show trending or personalized suggestions.

Scaling to billions of queries#

Challenge	Solution
Trie too large for one server	Shard by first 1-2 characters ("a-m" on shard 1, "n-z" on shard 2)
High read traffic	Replicas per shard, CDN-cached popular prefixes
Real-time trending	Streaming pipeline alongside batch trie rebuilds
Global latency	Edge-deployed trie replicas per region

Data flow summary#

User types → Debounce (200ms) → Check browser cache →
  Cache hit → Show suggestions
  Cache miss → API request → Check Redis prefix cache →
    Cache hit → Return cached suggestions
    Cache miss → Query trie → Return + cache result

Visualize your search architecture#

See how autocomplete fits into a full search system — try Codelit to generate an interactive diagram showing how tries, caches, and search services connect.

Key takeaways#

Trie is the core data structure — O(prefix length) lookup
Pre-compute top-k at each node — no traversal at query time
Debounce 200ms — reduces server load dramatically
Cache aggressively — browser, CDN, Redis prefix cache
Offline trie build + online trending cache — balance freshness and cost
Shard by prefix for horizontal scaling

{ }

Explore the Spotify architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

90+ Templates

Practice with real-world architectures — Uber, Netflix, Slack, and more

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive architecture for Design an Autocomplete System in seconds.

Try it in Codelit →

Design an Autocomplete System — Trie, Search Ranking, and Real-Time Suggestions

Why autocomplete is everywhere#

The core data structure: Trie#

Ranking suggestions#

Architecture#

Write path (offline)#

Read path (online)#

Real-time updates#

Optimization techniques#

Prefix caching#

Debouncing#

Browser caching#

Sampling#

Handling special cases#

Scaling to billions of queries#

Data flow summary#

Visualize your search architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

E-Commerce Checkout System

Build this architecture

Design an Autocomplete System — Trie, Search Ranking, and Real-Time Suggestions

Why autocomplete is everywhere#

The core data structure: Trie#

Ranking suggestions#

Architecture#

Write path (offline)#

Read path (online)#

Real-time updates#

Optimization techniques#

Prefix caching#

Debouncing#

Browser caching#

Sampling#

Handling special cases#

Scaling to billions of queries#

Data flow summary#

Visualize your search architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

E-Commerce Checkout System

Build this architecture