system-designsearchelasticsearcharchitecture

Search Engine Architecture: How Full-Text Search Really Works

March 28, 2026 6 min readBy Codelit Team Discussion

Every time a user types a query and gets results in milliseconds, a sophisticated pipeline of crawling, indexing, and ranking is at work. Understanding search engine architecture is essential for building fast, relevant search experiences at scale.

How Search Works: Crawl, Index, Rank#

All search systems follow three core phases:

Crawl — Discover and fetch content (documents, pages, records).
Index — Analyze and store content in a structure optimized for retrieval.
Rank — Score and order results by relevance to the query.

Web search engines crawl billions of pages. Internal search systems ingest database records, product catalogs, or log entries. The architecture is the same.

The Inverted Index#

The inverted index is the foundational data structure behind full-text search. Instead of mapping documents to words, it maps each term to the list of documents containing it:

"kubernetes"  → [doc_3, doc_17, doc_42]
"architecture" → [doc_3, doc_8, doc_17]
"search"      → [doc_1, doc_3, doc_8, doc_42]

A query for "kubernetes architecture" intersects the posting lists to find doc_3 and doc_17. This is how engines return results in constant or near-constant time regardless of corpus size.

Tokenization, Stemming, and Analyzers#

Before text enters the index, it passes through an analysis pipeline:

Tokenizer — splits text into tokens ("full-text search" → ["full", "text", "search"]).
Lowercasing — normalizes case.
Stop-word removal — drops common words like "the", "is", "and".
Stemming / Lemmatization — reduces words to roots ("running" → "run").

In Elasticsearch architecture, this is configured per field:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "blog_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop", "snowball"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": { "type": "text", "analyzer": "blog_analyzer" },
      "body":  { "type": "text", "analyzer": "blog_analyzer" },
      "slug":  { "type": "keyword" }
    }
  }
}

Relevance Scoring: TF-IDF and BM25#

Search ranking determines result order. Two models dominate:

TF-IDF#

Term Frequency (TF) — how often a term appears in a document.
Inverse Document Frequency (IDF) — how rare the term is across all documents.
Score = TF × IDF. Rare terms in a document get higher weight.

BM25#

BM25 is the default in Elasticsearch and Solr. It improves on TF-IDF with saturation (diminishing returns for repeated terms) and document-length normalization:

score(D, Q) = Σ IDF(qi) * (f(qi, D) * (k1 + 1)) / (f(qi, D) + k1 * (1 - b + b * |D| / avgdl))

Typical defaults: k1 = 1.2, b = 0.75. You rarely need to tune these unless your documents vary wildly in length.

Faceted Search#

Faceted search lets users filter results by categories — price range, brand, date, status. It requires maintaining aggregation-friendly data alongside the inverted index.

{
  "query": { "match": { "body": "search engine architecture" } },
  "aggs": {
    "by_category": { "terms": { "field": "category.keyword" } },
    "by_year":     { "date_histogram": { "field": "date", "calendar_interval": "year" } }
  }
}

Facets are computed in a single pass during query execution — no extra round trip.

Autocomplete and Typeahead#

Fast autocomplete requires specialized data structures:

Prefix queries on keyword fields — simple but limited.
Edge n-gram tokenizer — indexes prefixes at write time ("arch" → ["a", "ar", "arc", "arch"]).
Completion suggester (Elasticsearch) — uses an in-memory FST for sub-millisecond suggestions.

{
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion"
      }
    }
  }
}

Query with:

{
  "suggest": {
    "title-suggest": {
      "prefix": "searc",
      "completion": { "field": "suggest", "size": 5 }
    }
  }
}

Distributed Search: Shards and Replicas#

At scale, a single node cannot hold the entire index. Distributed search splits data across shards and copies them as replicas:

Concept	Purpose
Primary shard	Holds a partition of the index
Replica shard	Copy of a primary for fault tolerance and read throughput
Coordinator node	Receives the query, fans it out, merges results

A query against a 5-shard index runs in parallel on all 5 shards. The coordinator merges the top-N results — a scatter-gather pattern.

Shard sizing rule of thumb: 10–50 GB per shard. Too many small shards waste overhead; too few large shards slow queries.

Solr vs Elasticsearch#

Both are built on Apache Lucene. Key differences in the Solr vs Elasticsearch debate:

	Elasticsearch	Solr
Config	REST API, JSON	XML config files
Cluster management	Built-in	Requires ZooKeeper
Real-time indexing	Near real-time by default	Requires soft commits
Analytics	Strong (aggregations)	Comparable (facets, pivots)
Community	Larger ecosystem	Mature, stable

For new projects, Elasticsearch (or OpenSearch) is the more common choice.

Modern Search Tools#

The landscape has expanded beyond Lucene-based engines:

Elasticsearch / OpenSearch — the industry standard for log analytics and full-text search. OpenSearch is the Apache-licensed fork.
Meilisearch — Rust-based, typo-tolerant, instant search. Great for product catalogs and documentation.
Typesense — C++-based, easy to operate, built-in typo tolerance and geo-search.
Algolia — hosted search-as-a-service with excellent frontend SDKs. Higher cost at scale.

Quick Typesense example#

# Create collection
curl -X POST 'http://localhost:8108/collections' \
  -H 'X-TYPESENSE-API-KEY: xyz' \
  -d '{
    "name": "articles",
    "fields": [
      { "name": "title", "type": "string" },
      { "name": "body",  "type": "string" },
      { "name": "tags",  "type": "string[]", "facet": true }
    ],
    "default_sorting_field": ""
  }'

# Search with typo tolerance
curl 'http://localhost:8108/collections/articles/documents/search?q=elastcsearch&query_by=title,body&facet_by=tags'

Search Architecture Patterns#

Pattern 1: Dual-Write (Simple)#

Application writes to both the primary database and the search index. Risk: inconsistency if one write fails.

Pattern 2: Change Data Capture (Robust)#

A CDC pipeline (Debezium, DynamoDB Streams) tails the database log and pushes changes to the search index. Guarantees eventual consistency.

Pattern 3: Event-Driven#

Producers emit domain events. A search indexer consumer processes events and updates the index. Decoupled and scalable.

[App] → [Kafka / SQS] → [Indexer Service] → [Elasticsearch]
                                ↑
                          CDC from Postgres

Key Takeaways#

The inverted index is the core of all full-text search.
BM25 handles relevance scoring well out of the box — tune analyzers before touching scoring parameters.
Shard carefully: over-sharding is the most common Elasticsearch mistake.
Use CDC or event-driven patterns to keep search indexes in sync.
Evaluate Meilisearch and Typesense for simpler use cases — they reduce operational burden significantly.

Search is one of those systems that looks simple on the surface but rewards deep architectural understanding. Get it right, and users never think about it. Get it wrong, and they leave.

Build search-driven systems and more — explore hands-on system design content at codelit.io.

144 articles on system design at codelit.io/blog.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Slack-like Team Messaging

Workspace-based team messaging with channels, threads, file sharing, and integrations.

9 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Build this architecture

Generate an interactive Search Engine Architecture in seconds.

Try it in Codelit →

system-designsearchelasticsearcharchitecture

Search Engine Architecture: How Full-Text Search Really Works

March 28, 2026 6 min readBy Codelit Team Discussion

How Search Works: Crawl, Index, Rank#

All search systems follow three core phases:

Crawl — Discover and fetch content (documents, pages, records).
Index — Analyze and store content in a structure optimized for retrieval.
Rank — Score and order results by relevance to the query.

Web search engines crawl billions of pages. Internal search systems ingest database records, product catalogs, or log entries. The architecture is the same.

The Inverted Index#

The inverted index is the foundational data structure behind full-text search. Instead of mapping documents to words, it maps each term to the list of documents containing it:

"kubernetes"  → [doc_3, doc_17, doc_42]
"architecture" → [doc_3, doc_8, doc_17]
"search"      → [doc_1, doc_3, doc_8, doc_42]

A query for "kubernetes architecture" intersects the posting lists to find doc_3 and doc_17. This is how engines return results in constant or near-constant time regardless of corpus size.

Tokenization, Stemming, and Analyzers#

Before text enters the index, it passes through an analysis pipeline:

Tokenizer — splits text into tokens ("full-text search" → ["full", "text", "search"]).
Lowercasing — normalizes case.
Stop-word removal — drops common words like "the", "is", "and".
Stemming / Lemmatization — reduces words to roots ("running" → "run").

In Elasticsearch architecture, this is configured per field:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "blog_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop", "snowball"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": { "type": "text", "analyzer": "blog_analyzer" },
      "body":  { "type": "text", "analyzer": "blog_analyzer" },
      "slug":  { "type": "keyword" }
    }
  }
}

Relevance Scoring: TF-IDF and BM25#

Search ranking determines result order. Two models dominate:

TF-IDF#

Term Frequency (TF) — how often a term appears in a document.
Inverse Document Frequency (IDF) — how rare the term is across all documents.
Score = TF × IDF. Rare terms in a document get higher weight.

BM25#

BM25 is the default in Elasticsearch and Solr. It improves on TF-IDF with saturation (diminishing returns for repeated terms) and document-length normalization:

score(D, Q) = Σ IDF(qi) * (f(qi, D) * (k1 + 1)) / (f(qi, D) + k1 * (1 - b + b * |D| / avgdl))

Typical defaults: k1 = 1.2, b = 0.75. You rarely need to tune these unless your documents vary wildly in length.

Faceted Search#

Faceted search lets users filter results by categories — price range, brand, date, status. It requires maintaining aggregation-friendly data alongside the inverted index.

{
  "query": { "match": { "body": "search engine architecture" } },
  "aggs": {
    "by_category": { "terms": { "field": "category.keyword" } },
    "by_year":     { "date_histogram": { "field": "date", "calendar_interval": "year" } }
  }
}

Facets are computed in a single pass during query execution — no extra round trip.

Autocomplete and Typeahead#

Fast autocomplete requires specialized data structures:

Prefix queries on keyword fields — simple but limited.
Edge n-gram tokenizer — indexes prefixes at write time ("arch" → ["a", "ar", "arc", "arch"]).
Completion suggester (Elasticsearch) — uses an in-memory FST for sub-millisecond suggestions.

{
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion"
      }
    }
  }
}

Query with:

{
  "suggest": {
    "title-suggest": {
      "prefix": "searc",
      "completion": { "field": "suggest", "size": 5 }
    }
  }
}

Distributed Search: Shards and Replicas#

At scale, a single node cannot hold the entire index. Distributed search splits data across shards and copies them as replicas:

Concept	Purpose
Primary shard	Holds a partition of the index
Replica shard	Copy of a primary for fault tolerance and read throughput
Coordinator node	Receives the query, fans it out, merges results

A query against a 5-shard index runs in parallel on all 5 shards. The coordinator merges the top-N results — a scatter-gather pattern.

Shard sizing rule of thumb: 10–50 GB per shard. Too many small shards waste overhead; too few large shards slow queries.

Solr vs Elasticsearch#

Both are built on Apache Lucene. Key differences in the Solr vs Elasticsearch debate:

	Elasticsearch	Solr
Config	REST API, JSON	XML config files
Cluster management	Built-in	Requires ZooKeeper
Real-time indexing	Near real-time by default	Requires soft commits
Analytics	Strong (aggregations)	Comparable (facets, pivots)
Community	Larger ecosystem	Mature, stable

For new projects, Elasticsearch (or OpenSearch) is the more common choice.

Modern Search Tools#

The landscape has expanded beyond Lucene-based engines:

Elasticsearch / OpenSearch — the industry standard for log analytics and full-text search. OpenSearch is the Apache-licensed fork.
Meilisearch — Rust-based, typo-tolerant, instant search. Great for product catalogs and documentation.
Typesense — C++-based, easy to operate, built-in typo tolerance and geo-search.
Algolia — hosted search-as-a-service with excellent frontend SDKs. Higher cost at scale.

Quick Typesense example#

# Create collection
curl -X POST 'http://localhost:8108/collections' \
  -H 'X-TYPESENSE-API-KEY: xyz' \
  -d '{
    "name": "articles",
    "fields": [
      { "name": "title", "type": "string" },
      { "name": "body",  "type": "string" },
      { "name": "tags",  "type": "string[]", "facet": true }
    ],
    "default_sorting_field": ""
  }'

# Search with typo tolerance
curl 'http://localhost:8108/collections/articles/documents/search?q=elastcsearch&query_by=title,body&facet_by=tags'

Search Architecture Patterns#

Pattern 1: Dual-Write (Simple)#

Application writes to both the primary database and the search index. Risk: inconsistency if one write fails.

Pattern 2: Change Data Capture (Robust)#

A CDC pipeline (Debezium, DynamoDB Streams) tails the database log and pushes changes to the search index. Guarantees eventual consistency.

Pattern 3: Event-Driven#

Producers emit domain events. A search indexer consumer processes events and updates the index. Decoupled and scalable.

[App] → [Kafka / SQS] → [Indexer Service] → [Elasticsearch]
                                ↑
                          CDC from Postgres

Key Takeaways#

The inverted index is the core of all full-text search.
BM25 handles relevance scoring well out of the box — tune analyzers before touching scoring parameters.
Shard carefully: over-sharding is the most common Elasticsearch mistake.
Use CDC or event-driven patterns to keep search indexes in sync.
Evaluate Meilisearch and Typesense for simpler use cases — they reduce operational burden significantly.

Search is one of those systems that looks simple on the surface but rewards deep architectural understanding. Get it right, and users never think about it. Get it wrong, and they leave.

Build search-driven systems and more — explore hands-on system design content at codelit.io.

144 articles on system design at codelit.io/blog.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive Search Engine Architecture in seconds.

Try it in Codelit →

Search Engine Architecture: How Full-Text Search Really Works

How Search Works: Crawl, Index, Rank#

The Inverted Index#

Tokenization, Stemming, and Analyzers#

Relevance Scoring: TF-IDF and BM25#

TF-IDF#

BM25#

Faceted Search#

Autocomplete and Typeahead#

Distributed Search: Shards and Replicas#

Solr vs Elasticsearch#

Modern Search Tools#

Quick Typesense example#

Search Architecture Patterns#

Pattern 1: Dual-Write (Simple)#

Pattern 2: Change Data Capture (Robust)#

Pattern 3: Event-Driven#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Slack-like Team Messaging

Search Engine Architecture

Build this architecture

Search Engine Architecture: How Full-Text Search Really Works

How Search Works: Crawl, Index, Rank#

The Inverted Index#

Tokenization, Stemming, and Analyzers#

Relevance Scoring: TF-IDF and BM25#

TF-IDF#

BM25#

Faceted Search#

Autocomplete and Typeahead#

Distributed Search: Shards and Replicas#

Solr vs Elasticsearch#

Modern Search Tools#

Quick Typesense example#

Search Architecture Patterns#

Pattern 1: Dual-Write (Simple)#

Pattern 2: Change Data Capture (Robust)#

Pattern 3: Event-Driven#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Slack-like Team Messaging

Search Engine Architecture

Build this architecture