infrastructuredatabasesystem-design

Elasticsearch Architecture — How It Handles Billions of Documents

March 24, 2026 4 min readBy Mo Discussion

Why Elasticsearch dominates full-text search#

Elasticsearch powers search at Wikipedia, GitHub, Stack Overflow, and thousands of companies. It handles billions of documents with sub-second query times.

Understanding its architecture teaches you how distributed systems handle reads at massive scale.

The core: inverted indexes#

Traditional databases store documents and scan them for matches. Elasticsearch flips this — it stores a mapping from every term to the documents containing it.

"kubernetes" → [doc_42, doc_187, doc_2901]
"architecture" → [doc_42, doc_88, doc_2901, doc_5002]
"explained" → [doc_42, doc_5002]

Query: "kubernetes architecture"
→ Intersection: [doc_42, doc_2901]

This is why full-text search is O(1) per term, not O(n) per document.

Architecture components#

Cluster#

A group of nodes working together. One cluster can hold billions of documents across hundreds of nodes.

Node types#

Master node — Manages cluster state: index creation, shard allocation, node membership
Data node — Stores shards and executes search/index operations
Coordinating node — Routes queries, merges results from data nodes
Ingest node — Pre-processes documents (parse, transform, enrich) before indexing

Index#

A logical namespace for documents (like a database table). Each index has a mapping (schema) defining field types.

Shards#

Each index is split into shards — horizontal partitions distributed across data nodes.

Primary shard — The authoritative copy. Handles writes.
Replica shard — A copy on a different node. Handles reads and provides failover.

Index: "products" (5 primary shards, 1 replica each)

Node A: [P0, R2, R4]
Node B: [P1, R3, P4]
Node C: [P2, R0, P3]
        [R1]

Write path#

Client sends document to coordinating node
Coordinating node routes to correct primary shard (hash of _id)
Primary shard writes to an in-memory buffer + translog (write-ahead log)
Every 1 second: buffer is flushed to a new segment (immutable Lucene index)
Primary replicates to all replica shards
Once all replicas acknowledge → success returned to client

The 1-second refresh interval is why Elasticsearch is "near-real-time" — documents are searchable within ~1 second.

Read path#

Client sends query to coordinating node
Coordinating node broadcasts to one copy of each shard (primary or replica)
Each shard searches its local inverted index, returns top-N results with scores
Coordinating node merges results, re-ranks, returns final response

Scoring: TF-IDF and BM25#

Elasticsearch ranks results using BM25 (improved TF-IDF):

Term frequency (TF) — How often does the term appear in this document?
Inverse document frequency (IDF) — How rare is this term across all documents?
Field length — Shorter fields get higher scores (title match > body match)

When to use Elasticsearch#

Great for:

Full-text search across large document collections
Log aggregation and analysis (ELK stack)
Auto-complete and suggestions
Faceted navigation (filters + counts)
Geospatial queries (find stores near me)

Not great for:

Primary data store (no transactions, eventual consistency)
Frequent updates to the same document (segments are immutable)
Joins across indexes (denormalize instead)
Small datasets (PostgreSQL full-text search is simpler)

Scaling patterns#

Pattern	When
Add replicas	Read-heavy, need more throughput
Add shards	Write-heavy, need more index capacity
Hot-warm-cold	Time-series data, recent data on fast SSDs
Cross-cluster search	Multi-region deployment
Index lifecycle	Auto-rollover, shrink old indexes

Visualize your search architecture#

See how Elasticsearch fits into your system — try Codelit to generate an interactive diagram showing how search, indexing, and query routing connect to your application.

Key takeaways#

Inverted indexes make full-text search O(1) per term
Sharding distributes data; replicas provide redundancy and read throughput
Near-real-time — 1-second refresh delay for new documents
Not a primary database — use alongside PostgreSQL/MongoDB for source of truth
BM25 scoring handles relevance ranking out of the box
ELK stack (Elasticsearch + Logstash + Kibana) is the standard for log analysis

{ }

Explore the GitHub architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Google Search Engine Architecture

Web-scale search with crawling, indexing, PageRank, query processing, ads, and knowledge graph.

10 components

Build this architecture

Generate an interactive Elasticsearch Architecture in seconds.

Try it in Codelit →

infrastructuredatabasesystem-design

Elasticsearch Architecture — How It Handles Billions of Documents

March 24, 2026 4 min readBy Mo Discussion

Why Elasticsearch dominates full-text search#

Elasticsearch powers search at Wikipedia, GitHub, Stack Overflow, and thousands of companies. It handles billions of documents with sub-second query times.

Understanding its architecture teaches you how distributed systems handle reads at massive scale.

The core: inverted indexes#

Traditional databases store documents and scan them for matches. Elasticsearch flips this — it stores a mapping from every term to the documents containing it.

"kubernetes" → [doc_42, doc_187, doc_2901]
"architecture" → [doc_42, doc_88, doc_2901, doc_5002]
"explained" → [doc_42, doc_5002]

Query: "kubernetes architecture"
→ Intersection: [doc_42, doc_2901]

This is why full-text search is O(1) per term, not O(n) per document.

Architecture components#

Cluster#

A group of nodes working together. One cluster can hold billions of documents across hundreds of nodes.

Node types#

Master node — Manages cluster state: index creation, shard allocation, node membership
Data node — Stores shards and executes search/index operations
Coordinating node — Routes queries, merges results from data nodes
Ingest node — Pre-processes documents (parse, transform, enrich) before indexing

Index#

A logical namespace for documents (like a database table). Each index has a mapping (schema) defining field types.

Shards#

Each index is split into shards — horizontal partitions distributed across data nodes.

Primary shard — The authoritative copy. Handles writes.
Replica shard — A copy on a different node. Handles reads and provides failover.

Index: "products" (5 primary shards, 1 replica each)

Node A: [P0, R2, R4]
Node B: [P1, R3, P4]
Node C: [P2, R0, P3]
        [R1]

Write path#

Client sends document to coordinating node
Coordinating node routes to correct primary shard (hash of _id)
Primary shard writes to an in-memory buffer + translog (write-ahead log)
Every 1 second: buffer is flushed to a new segment (immutable Lucene index)
Primary replicates to all replica shards
Once all replicas acknowledge → success returned to client

The 1-second refresh interval is why Elasticsearch is "near-real-time" — documents are searchable within ~1 second.

Read path#

Client sends query to coordinating node
Coordinating node broadcasts to one copy of each shard (primary or replica)
Each shard searches its local inverted index, returns top-N results with scores
Coordinating node merges results, re-ranks, returns final response

Scoring: TF-IDF and BM25#

Elasticsearch ranks results using BM25 (improved TF-IDF):

Term frequency (TF) — How often does the term appear in this document?
Inverse document frequency (IDF) — How rare is this term across all documents?
Field length — Shorter fields get higher scores (title match > body match)

When to use Elasticsearch#

Great for:

Full-text search across large document collections
Log aggregation and analysis (ELK stack)
Auto-complete and suggestions
Faceted navigation (filters + counts)
Geospatial queries (find stores near me)

Not great for:

Primary data store (no transactions, eventual consistency)
Frequent updates to the same document (segments are immutable)
Joins across indexes (denormalize instead)
Small datasets (PostgreSQL full-text search is simpler)

Scaling patterns#

Pattern	When
Add replicas	Read-heavy, need more throughput
Add shards	Write-heavy, need more index capacity
Hot-warm-cold	Time-series data, recent data on fast SSDs
Cross-cluster search	Multi-region deployment
Index lifecycle	Auto-rollover, shrink old indexes

Visualize your search architecture#

See how Elasticsearch fits into your system — try Codelit to generate an interactive diagram showing how search, indexing, and query routing connect to your application.

Key takeaways#

Inverted indexes make full-text search O(1) per term
Sharding distributes data; replicas provide redundancy and read throughput
Near-real-time — 1-second refresh delay for new documents
Not a primary database — use alongside PostgreSQL/MongoDB for source of truth
BM25 scoring handles relevance ranking out of the box
ELK stack (Elasticsearch + Logstash + Kibana) is the standard for log analysis

{ }

Explore the GitHub architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive Elasticsearch Architecture in seconds.

Try it in Codelit →

Elasticsearch Architecture — How It Handles Billions of Documents

Why Elasticsearch dominates full-text search#

The core: inverted indexes#

Architecture components#

Cluster#

Node types#

Index#

Shards#

Write path#

Read path#

Scoring: TF-IDF and BM25#

When to use Elasticsearch#

Scaling patterns#

Visualize your search architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Netflix Video Streaming Architecture

Search Engine Architecture

Google Search Engine Architecture

Build this architecture

Elasticsearch Architecture — How It Handles Billions of Documents

Why Elasticsearch dominates full-text search#

The core: inverted indexes#

Architecture components#

Cluster#

Node types#

Index#

Shards#

Write path#

Read path#

Scoring: TF-IDF and BM25#

When to use Elasticsearch#

Scaling patterns#

Visualize your search architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Netflix Video Streaming Architecture

Search Engine Architecture

Google Search Engine Architecture

Build this architecture