system designautocompletetypeaheadsearchtrieelasticsearch

Autocomplete System Design: Building a Typeahead Search Engine

March 28, 2026 6 min readBy Codelit Team Discussion

When you type a few characters into Google, Amazon, or Spotify and instantly see relevant suggestions, you're interacting with an autocomplete system. Designing one that serves millions of queries per second with sub-50ms latency is a classic system design challenge. This guide breaks down every layer — from the trie data structure to distributed caching and fuzzy matching.

Core Data Structure: The Trie#

A trie (prefix tree) is the foundational data structure for autocomplete. Each node represents a character, and paths from root to leaf spell out stored strings.

Why Tries Over Hash Maps?#

Prefix matching is O(L) where L is the prefix length — no need to scan every entry.
Memory can be optimized with compressed tries (radix trees) that merge single-child chains.
Naturally supports ranked traversal when nodes store aggregated scores.

Node Structure#

Each trie node typically stores:

A map of child characters to child nodes
A boolean indicating whether this node completes a valid term
A score or frequency counter for ranking
Optionally, a precomputed list of top-K suggestions from this prefix

Ranking Suggestions#

Raw prefix matching returns too many results. Ranking narrows them to the most useful.

Frequency-Based Ranking#

Count how often each query has been searched. Store cumulative frequency at each node and propagate scores upward so parent nodes can quickly return the highest-scoring completions.

Recency Weighting#

Apply a time-decay function so recent queries rank higher:

score = raw_frequency * decay_factor ^ (now - last_seen)

A common decay factor is 0.99 per hour, which halves a score roughly every three days.

Personalization#

Layer user-specific history on top of global rankings. A simple approach:

final_score = alpha * global_score + (1 - alpha) * user_score

Where alpha is typically 0.7, favoring global trends but boosting personal relevance.

Top-K Caching at Each Node#

Recomputing rankings on every keystroke is expensive. Instead, precompute and cache the top-K (e.g., top 10) suggestions at every trie node. When a user types a prefix, the system returns the cached list directly — O(1) lookup after reaching the node.

Cache Invalidation#

Rebuild top-K lists periodically (e.g., every 15 minutes) using an offline MapReduce job that processes query logs and updates the trie.

Real-Time Updates#

For trending queries (breaking news, viral events), waiting 15 minutes is too slow.

Approach: Dual-Layer Architecture#

Base trie — rebuilt periodically from historical data.
Real-time trie — an in-memory trie that ingests the latest query stream via Kafka. Results from both layers are merged at query time.

This avoids constant rebuilds of the full trie while keeping suggestions fresh.

Distributed Trie#

A single-server trie won't scale to billions of terms. Two sharding strategies:

Range-Based Sharding#

Partition by first character (or first two characters). Server A handles a–f, Server B handles g–m, etc. Simple but can create hotspots — queries starting with "s" are far more common than "x."

Hash-Based Sharding#

Hash the prefix and distribute across servers. Balances load more evenly but makes range queries harder. A consistent hashing ring handles node additions and removals gracefully.

Replication#

Each shard is replicated 3x for fault tolerance. Reads go to the nearest replica; writes go to the leader.

Multi-Language Support#

Autocomplete must handle diverse scripts and input methods.

Unicode normalization — apply NFC normalization so that equivalent character sequences match.
Transliteration — map romanized input (e.g., "beijing") to native script (北京).
CJK tokenization — Chinese, Japanese, and Korean lack word boundaries. Use n-gram indexing or dictionary-based segmentation.
Right-to-left scripts — ensure prefix matching works correctly for Arabic and Hebrew.

Fuzzy Matching#

Users make typos. A strict prefix match for "amazn" returns nothing.

Edit Distance#

Compute Levenshtein distance between the query and stored terms. Terms within distance 1–2 are included as suggestions. This is expensive at scale, so limit fuzzy matching to short prefixes or use BK-trees.

Phonetic Matching#

Algorithms like Soundex or Metaphone group words that sound alike, helping with misspellings that preserve pronunciation.

Combining Approaches#

Elasticsearch's completion suggester supports fuzzy matching out of the box:

{
  "suggest": {
    "song-suggest": {
      "prefix": "amazn",
      "completion": {
        "field": "suggest",
        "fuzzy": { "fuzziness": 2 }
      }
    }
  }
}

Tools and Technologies#

Elasticsearch Completion Suggester#

Uses an in-memory FST (finite state transducer) for fast prefix lookups. Supports weights, fuzzy matching, and context-based filtering. Ideal for medium-scale deployments.

Algolia#

A hosted search-as-a-service with built-in typo tolerance, geo-aware ranking, and sub-10ms responses. Great when you want autocomplete without managing infrastructure.

Redis + Sorted Sets#

Store terms in sorted sets scored by frequency. Use ZRANGEBYLEX for prefix queries and ZREVRANGEBYSCORE for top-K retrieval. Works well for moderate-scale systems with simple ranking.

Custom Trie Service#

For Google-scale systems, a custom trie service built in C++ or Rust gives maximum control over memory layout, sharding, and ranking logic.

Putting It All Together#

A production autocomplete system at scale looks like this:

Query logs flow into Kafka.
A batch pipeline (Spark/MapReduce) rebuilds the base trie every 15 minutes.
A streaming pipeline (Flink/Kafka Streams) updates the real-time trie.
Trie servers are sharded by prefix range and replicated 3x.
A CDN/edge cache stores top-K results for the most popular prefixes.
The API gateway merges base + real-time results, applies personalization, and returns top 10 suggestions.

Latency Budget#

Component	Target
CDN cache hit	< 5ms
Trie lookup	< 10ms
Merge + rank	< 5ms
Network (edge)	< 30ms
Total	< 50ms

Key Takeaways#

Tries are the backbone — compressed tries save memory, cached top-K lists save computation.
Ranking combines frequency, recency, and personalization.
A dual-layer architecture (batch + streaming) keeps suggestions fresh without constant rebuilds.
Distributed sharding and replication are essential at scale.
Fuzzy matching and multi-language support make the system robust for real users.

Autocomplete is deceptively simple on the surface but deeply complex underneath. Mastering its design demonstrates fluency in data structures, distributed systems, caching, and ranking — exactly what interviewers look for.

Build and deploy full-stack projects with guided system design exercises at codelit.io.

This is article #183 in the Codelit engineering blog series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

E-Commerce Checkout System

Production checkout flow with Stripe payments, inventory management, and fraud detection.

11 components

Notification System

Multi-channel notification platform with preferences, templating, and delivery tracking.

9 components

Build this architecture

Generate an interactive architecture for Autocomplete System Design in seconds.

Try it in Codelit →

system designautocompletetypeaheadsearchtrieelasticsearch

Autocomplete System Design: Building a Typeahead Search Engine

March 28, 2026 6 min readBy Codelit Team Discussion

Core Data Structure: The Trie#

A trie (prefix tree) is the foundational data structure for autocomplete. Each node represents a character, and paths from root to leaf spell out stored strings.

Why Tries Over Hash Maps?#

Prefix matching is O(L) where L is the prefix length — no need to scan every entry.
Memory can be optimized with compressed tries (radix trees) that merge single-child chains.
Naturally supports ranked traversal when nodes store aggregated scores.

Node Structure#

Each trie node typically stores:

A map of child characters to child nodes
A boolean indicating whether this node completes a valid term
A score or frequency counter for ranking
Optionally, a precomputed list of top-K suggestions from this prefix

Ranking Suggestions#

Raw prefix matching returns too many results. Ranking narrows them to the most useful.

Frequency-Based Ranking#

Count how often each query has been searched. Store cumulative frequency at each node and propagate scores upward so parent nodes can quickly return the highest-scoring completions.

Recency Weighting#

Apply a time-decay function so recent queries rank higher:

score = raw_frequency * decay_factor ^ (now - last_seen)

A common decay factor is 0.99 per hour, which halves a score roughly every three days.

Personalization#

Layer user-specific history on top of global rankings. A simple approach:

final_score = alpha * global_score + (1 - alpha) * user_score

Where alpha is typically 0.7, favoring global trends but boosting personal relevance.

Top-K Caching at Each Node#

Cache Invalidation#

Rebuild top-K lists periodically (e.g., every 15 minutes) using an offline MapReduce job that processes query logs and updates the trie.

Real-Time Updates#

For trending queries (breaking news, viral events), waiting 15 minutes is too slow.

Approach: Dual-Layer Architecture#

Base trie — rebuilt periodically from historical data.
Real-time trie — an in-memory trie that ingests the latest query stream via Kafka. Results from both layers are merged at query time.

This avoids constant rebuilds of the full trie while keeping suggestions fresh.

Distributed Trie#

A single-server trie won't scale to billions of terms. Two sharding strategies:

Range-Based Sharding#

Partition by first character (or first two characters). Server A handles a–f, Server B handles g–m, etc. Simple but can create hotspots — queries starting with "s" are far more common than "x."

Hash-Based Sharding#

Hash the prefix and distribute across servers. Balances load more evenly but makes range queries harder. A consistent hashing ring handles node additions and removals gracefully.

Replication#

Each shard is replicated 3x for fault tolerance. Reads go to the nearest replica; writes go to the leader.

Multi-Language Support#

Autocomplete must handle diverse scripts and input methods.

Unicode normalization — apply NFC normalization so that equivalent character sequences match.
Transliteration — map romanized input (e.g., "beijing") to native script (北京).
CJK tokenization — Chinese, Japanese, and Korean lack word boundaries. Use n-gram indexing or dictionary-based segmentation.
Right-to-left scripts — ensure prefix matching works correctly for Arabic and Hebrew.

Fuzzy Matching#

Users make typos. A strict prefix match for "amazn" returns nothing.

Edit Distance#

Phonetic Matching#

Algorithms like Soundex or Metaphone group words that sound alike, helping with misspellings that preserve pronunciation.

Combining Approaches#

Elasticsearch's completion suggester supports fuzzy matching out of the box:

{
  "suggest": {
    "song-suggest": {
      "prefix": "amazn",
      "completion": {
        "field": "suggest",
        "fuzzy": { "fuzziness": 2 }
      }
    }
  }
}

Tools and Technologies#

Elasticsearch Completion Suggester#

Uses an in-memory FST (finite state transducer) for fast prefix lookups. Supports weights, fuzzy matching, and context-based filtering. Ideal for medium-scale deployments.

Algolia#

A hosted search-as-a-service with built-in typo tolerance, geo-aware ranking, and sub-10ms responses. Great when you want autocomplete without managing infrastructure.

Redis + Sorted Sets#

Store terms in sorted sets scored by frequency. Use ZRANGEBYLEX for prefix queries and ZREVRANGEBYSCORE for top-K retrieval. Works well for moderate-scale systems with simple ranking.

Custom Trie Service#

For Google-scale systems, a custom trie service built in C++ or Rust gives maximum control over memory layout, sharding, and ranking logic.

Putting It All Together#

A production autocomplete system at scale looks like this:

Query logs flow into Kafka.
A batch pipeline (Spark/MapReduce) rebuilds the base trie every 15 minutes.
A streaming pipeline (Flink/Kafka Streams) updates the real-time trie.
Trie servers are sharded by prefix range and replicated 3x.
A CDN/edge cache stores top-K results for the most popular prefixes.
The API gateway merges base + real-time results, applies personalization, and returns top 10 suggestions.

Latency Budget#

Component	Target
CDN cache hit	< 5ms
Trie lookup	< 10ms
Merge + rank	< 5ms
Network (edge)	< 30ms
Total	< 50ms

Key Takeaways#

Tries are the backbone — compressed tries save memory, cached top-K lists save computation.
Ranking combines frequency, recency, and personalization.
A dual-layer architecture (batch + streaming) keeps suggestions fresh without constant rebuilds.
Distributed sharding and replication are essential at scale.
Fuzzy matching and multi-language support make the system robust for real users.

Build and deploy full-stack projects with guided system design exercises at codelit.io.

This is article #183 in the Codelit engineering blog series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive architecture for Autocomplete System Design in seconds.

Try it in Codelit →

Autocomplete System Design: Building a Typeahead Search Engine

Core Data Structure: The Trie#

Why Tries Over Hash Maps?#

Node Structure#

Ranking Suggestions#

Frequency-Based Ranking#

Recency Weighting#

Personalization#

Top-K Caching at Each Node#

Cache Invalidation#

Real-Time Updates#

Approach: Dual-Layer Architecture#

Distributed Trie#

Range-Based Sharding#

Hash-Based Sharding#

Replication#

Multi-Language Support#

Fuzzy Matching#

Edit Distance#

Phonetic Matching#

Combining Approaches#

Tools and Technologies#

Elasticsearch Completion Suggester#

Algolia#

Redis + Sorted Sets#

Custom Trie Service#

Putting It All Together#

Latency Budget#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

E-Commerce Checkout System

Notification System

Build this architecture

Autocomplete System Design: Building a Typeahead Search Engine

Core Data Structure: The Trie#

Why Tries Over Hash Maps?#

Node Structure#

Ranking Suggestions#

Frequency-Based Ranking#

Recency Weighting#

Personalization#

Top-K Caching at Each Node#

Cache Invalidation#

Real-Time Updates#

Approach: Dual-Layer Architecture#

Distributed Trie#

Range-Based Sharding#

Hash-Based Sharding#

Replication#

Multi-Language Support#

Fuzzy Matching#

Edit Distance#

Phonetic Matching#

Combining Approaches#

Tools and Technologies#

Elasticsearch Completion Suggester#

Algolia#

Redis + Sorted Sets#

Custom Trie Service#

Putting It All Together#

Latency Budget#

Key Takeaways#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

E-Commerce Checkout System

Notification System

Build this architecture