Distributed Counters: Scaling Counts to Millions of Writes per Second
Counting sounds trivial until you need to do it at scale. A single row in a relational database can handle a few hundred increments per second before lock contention turns your counter into a bottleneck. Social platforms count likes, views, and shares across billions of events per day. This article explores the architecture patterns that make that possible.
Why Naive Counters Break#
A straightforward UPDATE counters SET value = value + 1 WHERE id = ? acquires a row-level lock on every write. Under high concurrency the lock queue grows, latency spikes, and throughput collapses. The fundamental issue is write contention on a single hot key.
Thread A ──▶ LOCK row ──▶ INCREMENT ──▶ UNLOCK
Thread B ──▶ ........wait........── LOCK ──▶ INCREMENT ──▶ UNLOCK
Thread C ──▶ ................wait................── LOCK ──▶ ...
Even optimistic concurrency (compare-and-swap) suffers under high contention because most retries fail, wasting CPU and network round-trips.
Sharded Counters#
The classic solution is to spread writes across N shards and sum them on read.
┌──────────────────────────────────────────┐
│ Logical Counter: post_likes │
├──────────┬──────────┬──────────┬─────────┤
│ Shard 0 │ Shard 1 │ Shard 2 │ Shard 3 │
│ 1,204 │ 1,187 │ 1,195 │ 1,210 │
└──────────┴──────────┴──────────┴─────────┘
Total = SUM(shards) = 4,796
Write Path#
- Hash the writer (or pick randomly) to select a shard index
i = hash(writer_id) % N. - Increment shard
i— contention drops by a factor of N.
Read Path#
- Read all N shards.
- Return the sum.
Trade-offs#
| Dimension | Impact |
|---|---|
| Write throughput | Scales linearly with shard count |
| Read cost | Increases linearly — must read N shards |
| Consistency | Eventual — a reader may see a stale sum |
| Complexity | Need to manage shard creation and rebalancing |
A common optimization is to cache the sum with a short TTL (1–5 seconds) so that reads remain cheap while the displayed count is only slightly stale.
Approximate Counters with HyperLogLog#
When you need distinct counts (unique visitors, unique likers), exact counting requires storing every seen ID — memory grows linearly. HyperLogLog (HLL) trades accuracy for space.
How HLL Works#
- Hash each element to a uniformly distributed bit string.
- Observe the position of the leftmost 1-bit (the "run" of leading zeros).
- Track the maximum run length across many sub-streams (registers).
- Estimate cardinality from the harmonic mean of the register values.
With 16 KB of memory, HLL achieves roughly 0.81% standard error for cardinalities in the billions.
Redis HyperLogLog#
PFADD page:home:visitors "user-42"
PFADD page:home:visitors "user-99"
PFCOUNT page:home:visitors -- returns approximate unique count
PFMERGE page:all:visitors page:home:visitors page:about:visitors
HLL is ideal when you can tolerate a small error margin — analytics dashboards, trending calculations, and spam detection all fit this profile.
Eventual Consistency for Counts#
Most users do not notice if a like count is off by a handful for a few seconds. This insight unlocks powerful optimizations.
Write-Behind Buffering#
- Incoming increments land in a local buffer (in-memory or a fast queue).
- A background process flushes the buffer periodically, batching increments into a single database write.
- The read path adds the buffer delta to the last-known persisted value.
Client ──▶ Buffer (memory) ──▶ Batch flush every 1s ──▶ Database
│
Read = DB value + buffer delta
This reduces database writes by orders of magnitude and smooths out traffic spikes.
CRDT Counters#
In multi-region deployments, G-Counters (grow-only CRDT counters) allow each node to increment independently. The merge operation takes the element-wise max across all replicas.
Node A: [5, 0, 0]
Node B: [0, 3, 0]
Node C: [0, 0, 7]
Merged: [5, 3, 7] → Total = 15
For counters that also decrement, a PN-Counter pairs a G-Counter for increments with a G-Counter for decrements. The value is the difference.
Redis INCR — The Workhorse#
Redis INCR is atomic and single-threaded, making it naturally safe for concurrent increments without locks.
Common Patterns#
Simple counter:
INCR article:42:views
GET article:42:views
Rate limiter (sliding window):
key = "rate:{user_id}:{current_minute}"
count = INCR key
EXPIRE key 60
if count > limit then reject
Batched persistence:
-- Every 10 seconds, a worker reads and resets
value = GETSET article:42:views 0
-- Persist `value` to the durable store
Scaling Redis Counters#
For counters that exceed a single Redis instance's throughput (rare, since Redis handles around 100K ops/sec), use Redis Cluster with hash tags to control slot assignment, or use client-side sharding across multiple keys.
Firestore Distributed Counters#
Google's Firestore documentation explicitly recommends distributed counters because a single Firestore document supports roughly one write per second.
Implementation#
- Create a subcollection
counters/{counterId}/shards/{0..N}. - Each write picks a random shard and increments its
countfield. - Reads aggregate all shards with a collection query.
// Write — pick a random shard
const shardId = Math.floor(Math.random() * NUM_SHARDS);
const shardRef = doc(db, "counters", counterId, "shards", String(shardId));
await updateDoc(shardRef, { count: increment(1) });
// Read — sum all shards
const snapshot = await getDocs(collection(db, "counters", counterId, "shards"));
let total = 0;
snapshot.forEach(doc => { total += doc.data().count; });
Choosing Shard Count#
- Start with 10 shards — supports ~10 writes/sec.
- Scale to 100+ shards for viral content.
- Use Cloud Functions to auto-scale shard count based on write rate.
Real-Time Leaderboard Counts#
Leaderboards combine counting with ranking. Redis sorted sets are the canonical solution.
ZINCRBY leaderboard:daily 1 "player:007"
ZREVRANGE leaderboard:daily 0 9 WITHSCORES -- top 10
ZREVRANK leaderboard:daily "player:007" -- player rank
Architecture for Scale#
┌────────────┐ ┌──────────────┐ ┌──────────────┐
│ Game Srvr │────▶│ Kafka Topic │────▶│ Leaderboard │
│ (events) │ │ (scores) │ │ Service │
└────────────┘ └──────────────┘ └──────┬───────┘
│
┌──────▼───────┐
│ Redis Sorted │
│ Set (ZADD) │
└──────────────┘
- Game servers emit score events to Kafka.
- A leaderboard consumer processes events and issues
ZINCRBYcommands. - The API layer reads from Redis for real-time rankings.
- A periodic job snapshots the leaderboard to a durable store for historical queries.
Multi-Region Leaderboards#
For global leaderboards, run a Redis sorted set per region and merge periodically. Alternatively, use a single global Redis cluster with read replicas for rank queries.
Choosing the Right Pattern#
| Scenario | Recommended Pattern |
|---|---|
| Simple page views | Redis INCR with periodic persistence |
| High-write social counters | Sharded counters (database or Firestore) |
| Unique visitor counts | HyperLogLog |
| Multi-region counters | CRDT G-Counter / PN-Counter |
| Real-time leaderboards | Redis sorted sets |
| Analytics aggregations | Write-behind buffer + batch processing |
Key Takeaways#
- Shard writes, cache reads — the universal principle behind every distributed counter pattern.
- Approximate when you can — HyperLogLog saves orders of magnitude of memory for distinct counts.
- Embrace eventual consistency — users tolerate slightly stale counts; your infrastructure will thank you.
- Redis is your friend — INCR, sorted sets, and HLL cover most counting use cases out of the box.
- Plan for viral spikes — auto-scaling shard counts prevents write bottlenecks when content goes viral.
Distributed counters appear in nearly every system design interview that involves social features, analytics, or gaming. Master these patterns and you will handle them with confidence.
This is article #378 in the Codelit system design series. Want to level up your system design skills? Explore the full collection at codelit.io.
Try it on Codelit
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Distributed Counters in seconds.
Try it in Codelit →
Comments