Elasticsearch Architecture — How It Handles Billions of Documents
Why Elasticsearch dominates full-text search#
Elasticsearch powers search at Wikipedia, GitHub, Stack Overflow, and thousands of companies. It handles billions of documents with sub-second query times.
Understanding its architecture teaches you how distributed systems handle reads at massive scale.
The core: inverted indexes#
Traditional databases store documents and scan them for matches. Elasticsearch flips this — it stores a mapping from every term to the documents containing it.
"kubernetes" → [doc_42, doc_187, doc_2901]
"architecture" → [doc_42, doc_88, doc_2901, doc_5002]
"explained" → [doc_42, doc_5002]
Query: "kubernetes architecture"
→ Intersection: [doc_42, doc_2901]
This is why full-text search is O(1) per term, not O(n) per document.
Architecture components#
Cluster#
A group of nodes working together. One cluster can hold billions of documents across hundreds of nodes.
Node types#
- Master node — Manages cluster state: index creation, shard allocation, node membership
- Data node — Stores shards and executes search/index operations
- Coordinating node — Routes queries, merges results from data nodes
- Ingest node — Pre-processes documents (parse, transform, enrich) before indexing
Index#
A logical namespace for documents (like a database table). Each index has a mapping (schema) defining field types.
Shards#
Each index is split into shards — horizontal partitions distributed across data nodes.
- Primary shard — The authoritative copy. Handles writes.
- Replica shard — A copy on a different node. Handles reads and provides failover.
Index: "products" (5 primary shards, 1 replica each)
Node A: [P0, R2, R4]
Node B: [P1, R3, P4]
Node C: [P2, R0, P3]
[R1]
Write path#
- Client sends document to coordinating node
- Coordinating node routes to correct primary shard (hash of
_id) - Primary shard writes to an in-memory buffer + translog (write-ahead log)
- Every 1 second: buffer is flushed to a new segment (immutable Lucene index)
- Primary replicates to all replica shards
- Once all replicas acknowledge → success returned to client
The 1-second refresh interval is why Elasticsearch is "near-real-time" — documents are searchable within ~1 second.
Read path#
- Client sends query to coordinating node
- Coordinating node broadcasts to one copy of each shard (primary or replica)
- Each shard searches its local inverted index, returns top-N results with scores
- Coordinating node merges results, re-ranks, returns final response
Scoring: TF-IDF and BM25#
Elasticsearch ranks results using BM25 (improved TF-IDF):
- Term frequency (TF) — How often does the term appear in this document?
- Inverse document frequency (IDF) — How rare is this term across all documents?
- Field length — Shorter fields get higher scores (title match > body match)
When to use Elasticsearch#
Great for:
- Full-text search across large document collections
- Log aggregation and analysis (ELK stack)
- Auto-complete and suggestions
- Faceted navigation (filters + counts)
- Geospatial queries (find stores near me)
Not great for:
- Primary data store (no transactions, eventual consistency)
- Frequent updates to the same document (segments are immutable)
- Joins across indexes (denormalize instead)
- Small datasets (PostgreSQL full-text search is simpler)
Scaling patterns#
| Pattern | When |
|---|---|
| Add replicas | Read-heavy, need more throughput |
| Add shards | Write-heavy, need more index capacity |
| Hot-warm-cold | Time-series data, recent data on fast SSDs |
| Cross-cluster search | Multi-region deployment |
| Index lifecycle | Auto-rollover, shrink old indexes |
Visualize your search architecture#
See how Elasticsearch fits into your system — try Codelit to generate an interactive diagram showing how search, indexing, and query routing connect to your application.
Key takeaways#
- Inverted indexes make full-text search O(1) per term
- Sharding distributes data; replicas provide redundancy and read throughput
- Near-real-time — 1-second refresh delay for new documents
- Not a primary database — use alongside PostgreSQL/MongoDB for source of truth
- BM25 scoring handles relevance ranking out of the box
- ELK stack (Elasticsearch + Logstash + Kibana) is the standard for log analysis
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
Try these templates
Netflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsSearch Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 componentsGoogle Search Engine Architecture
Web-scale search with crawling, indexing, PageRank, query processing, ads, and knowledge graph.
10 componentsBuild this architecture
Generate an interactive Elasticsearch Architecture in seconds.
Try it in Codelit →
Comments