Geographically Distributed Systems: Building Software That Spans the Globe
Geographically Distributed Systems#
When your users span continents, a single-region deployment creates a terrible experience. A user in Tokyo hitting a server in Virginia faces 150-200ms of network latency on every request — before the server even starts processing.
Geographically distributed systems solve this by placing compute and data closer to users. But distribution introduces hard problems: consistency, conflict resolution, and compliance.
Why Geo-Distribution?#
Three forces push systems toward multi-region:
- Latency — Physics limits speed of light. Cross-ocean round trips add 100-300ms
- Availability — A single region is a single point of failure. Cloud regions go down
- Compliance — GDPR, data sovereignty laws, and industry regulations require data to stay in specific geographies
Single region:
Tokyo user → Virginia server → 180ms RTT
Frankfurt user → Virginia server → 90ms RTT
Multi-region:
Tokyo user → Tokyo server → 5ms RTT
Frankfurt user → Frankfurt server → 3ms RTT
Data Replication Across Regions#
The core challenge is keeping data consistent across regions. There are two fundamental approaches.
Synchronous Replication#
Every write is confirmed by all replicas before acknowledging the client.
Client → Write to US-East
→ Replicate to EU-West (wait)
→ Replicate to AP-Southeast (wait)
→ All confirmed → ACK to client
Pros: Strong consistency — every read sees the latest write. Cons: Write latency equals the slowest replica. One slow region penalizes everyone.
Asynchronous Replication#
The primary region acknowledges immediately; replicas catch up in the background.
Client → Write to US-East → ACK (immediate)
→ Replicate to EU-West (background, ~50-200ms)
→ Replicate to AP-Southeast (background, ~100-300ms)
Pros: Fast writes. No region blocks another. Cons: Reads from replicas may return stale data (eventual consistency).
Consistency Models#
| Model | Guarantee | Latency | Use Case |
|---|---|---|---|
| Strong | Every read sees latest write | High | Financial transactions |
| Bounded staleness | Reads lag by at most N seconds | Medium | Dashboards, analytics |
| Session | User sees own writes | Low | User profiles, carts |
| Eventual | Replicas converge eventually | Lowest | Social feeds, logs |
Most applications use session consistency — users always see their own writes, even if other regions lag slightly.
Geo-Routing: Getting Users to the Right Region#
Latency-Based DNS#
DNS resolves to the region with lowest measured latency:
Route 53 latency-based routing:
User in Japan → ap-northeast-1
User in Germany → eu-west-1
User in Brazil → sa-east-1
GeoDNS#
Routes based on the user's geographic location (IP geolocation):
GeoDNS rules:
IP in EU → eu-west-1.app.com
IP in APAC → ap-southeast-1.app.com
IP in Americas → us-east-1.app.com
Anycast#
A single IP address is announced from multiple locations. BGP routing sends packets to the nearest one. Cloudflare and most CDNs use this approach.
Application-Level Routing#
For more control, route at the application layer:
// Middleware determines user's home region
function routeRequest(req) {
const userRegion = getUserHomeRegion(req.userId);
if (userRegion !== LOCAL_REGION) {
return proxy(req, regionEndpoints[userRegion]);
}
return handleLocally(req);
}
Active-Active vs Active-Passive#
Active-Passive (Primary-Secondary)#
One region handles all writes. Other regions serve reads and stand by for failover.
US-East (Primary): reads + writes
EU-West (Passive): reads only, receives replicated data
AP-South (Passive): reads only, receives replicated data
Failover: promote EU-West to primary (~30-60s)
Best for: Applications where write conflicts are unacceptable (banking, inventory).
Active-Active (Multi-Primary)#
Every region accepts both reads and writes. Conflicts are resolved after the fact.
US-East (Active): reads + writes → replicates to other regions
EU-West (Active): reads + writes → replicates to other regions
AP-South (Active): reads + writes → replicates to other regions
Best for: Low-latency writes globally, high availability requirements.
The cost: You must handle conflicts when two regions modify the same data simultaneously.
Conflict Resolution#
When two regions write to the same record, you need a resolution strategy.
Last-Writer-Wins (LWW)#
The write with the latest timestamp wins. Simple but can lose data.
Region A: SET user.name = "Alice" (t=100)
Region B: SET user.name = "Bob" (t=101)
Result: "Bob" wins. Alice's change is silently dropped.
Conflict-Free Replicated Data Types (CRDTs)#
Data structures that mathematically guarantee convergence without coordination:
- G-Counter — Grow-only counter (each region has its own counter, sum on read)
- OR-Set — Observed-remove set (add/remove without conflicts)
- LWW-Register — Last-writer-wins for single values
Application-Level Merge#
Custom logic for your domain:
// Shopping cart merge: union of items, max of quantities
function mergeCart(cartA, cartB) {
const merged = new Map();
for (const [item, qty] of [...cartA, ...cartB]) {
merged.set(item, Math.max(merged.get(item) || 0, qty));
}
return merged;
}
Version Vectors#
Track causality to detect true conflicts vs sequential updates. Only flag genuine conflicts for resolution.
Compliance and Data Residency#
GDPR Data Residency#
EU regulation requires that personal data of EU citizens can be stored and processed only in approved jurisdictions.
Architecture pattern:
EU user data → EU region ONLY (never replicated outside)
US user data → US region (can replicate to EU if needed)
Metadata/analytics → Any region (if anonymized)
Implementation Strategies#
- Region-pinned tables — User data stays in the user's home region
- Data classification — Tag data as "restricted" or "global," replicate accordingly
- Encryption boundaries — EU data encrypted with EU-managed keys
- Audit logging — Track every cross-region data movement
Multi-Region Databases#
CockroachDB#
Distributed SQL with configurable replication zones:
-- Pin EU user data to EU nodes
ALTER TABLE users CONFIGURE ZONE USING
constraints = '{"+region=eu-west-1": 3}',
num_replicas = 3;
-- Global tables replicated everywhere for low-latency reads
ALTER TABLE products CONFIGURE ZONE USING
num_replicas = 5,
constraints = '{"+region=us-east-1": 1, "+region=eu-west-1": 1, "+region=ap-southeast-1": 1}';
Google Cloud Spanner#
Globally consistent with TrueTime (atomic clocks + GPS):
- Externally consistent reads and writes
- Automatic sharding and replication
- 99.999% availability SLA
DynamoDB Global Tables#
Multi-region, multi-active with last-writer-wins:
- Automatic replication across selected regions
- Sub-second replication latency
- Conflict resolution via timestamps
Comparison#
| Feature | CockroachDB | Spanner | DynamoDB Global Tables |
|---|---|---|---|
| Consistency | Serializable | External | Eventual (per-item) |
| SQL support | Full PostgreSQL | SQL-like | NoSQL (PartiQL) |
| Conflict model | Serializable txns | Serializable txns | Last-writer-wins |
| Self-hosted | Yes | No | No |
| Latency profile | Region-optimized | Global strong | Region-local |
Architecture Checklist#
Before going multi-region, verify these:
- Identify data locality requirements — What data must stay in which region?
- Choose consistency model per data type — Not everything needs strong consistency
- Design conflict resolution — What happens when two regions write the same row?
- Plan failover procedures — Automated detection, DNS TTL, connection draining
- Set up cross-region observability — Replication lag dashboards, per-region error rates
- Test with chaos engineering — Simulate region failures, network partitions
- Compliance audit — Verify data residency constraints are enforced
Key Takeaways#
- Latency is physics — The only way to reduce it is to move compute closer to users
- Consistency is a spectrum — Choose the weakest model your use case can tolerate
- Active-active is powerful but expensive — Conflict resolution adds real complexity
- Compliance drives architecture — Data residency requirements may dictate your region topology
- Start active-passive — Graduate to active-active only when latency or availability demands it
Geo-distributed systems are among the hardest problems in backend engineering. If you're designing systems that serve users across regions, explore our architecture deep-dives at codelit.io.
This is article #169 in the Codelit engineering blog series.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Netflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsDistributed Rate Limiter
API rate limiting with sliding window, token bucket, and per-user quotas.
7 componentsSearch Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 componentsBuild this architecture
Generate an interactive architecture for Geographically Distributed Systems in seconds.
Try it in Codelit →
Comments