Database Replication Strategies: Sync, Async, and Conflict Resolution
If your application relies on a single database instance, you are one hardware failure away from total downtime. Database replication copies data across multiple servers so your system stays available, durable, and fast — even when things go wrong.
This guide covers replication topologies, consistency trade-offs, conflict resolution techniques, and the tools that implement them in practice.
Why Replicate?#
Replication serves three goals:
- High availability — if the primary node fails, a replica takes over.
- Read scalability — distribute read traffic across replicas to reduce load on the writer.
- Disaster recovery — keep copies in different regions so a data-center outage does not mean data loss.
Without replication you are accepting a single point of failure. With it, you trade simplicity for resilience.
Synchronous vs Asynchronous Replication#
Synchronous#
The primary waits for at least one replica to confirm the write before acknowledging the client. This guarantees zero data loss on failover but adds latency to every write.
Client → Primary → Replica (ACK) → Primary (ACK) → Client
Use synchronous replication when durability matters more than throughput — financial transactions, for example.
Asynchronous#
The primary acknowledges the client immediately and ships changes to replicas in the background. Writes are faster, but a crash before replication completes can lose recent data.
Client → Primary (ACK) → Client
Primary → Replica (eventually)
Most systems default to asynchronous replication because the latency savings are significant at scale.
Semi-synchronous#
A hybrid: the primary waits for one replica but not all. PostgreSQL calls this "synchronous commit with a synchronous standby." It balances durability and performance.
Replication Topologies#
Single-Leader (Primary–Replica)#
One node handles all writes; replicas consume a change stream and serve reads.
- Simple to reason about — no write conflicts.
- Failover requires electing a new leader (manual or automatic).
- PostgreSQL streaming replication and MySQL standard replication use this model.
Multi-Leader#
Multiple nodes accept writes and replicate changes to each other. Useful for geographically distributed deployments where routing all writes to one region adds unacceptable latency.
- Enables local writes in each region.
- Introduces write conflicts that must be resolved.
- MySQL Group Replication and CockroachDB support multi-leader semantics.
Leaderless (Dynamo-Style)#
Any node can accept reads and writes. The client sends writes to multiple nodes and reads from multiple nodes, using quorum rules (W + R > N) to guarantee consistency.
- No single point of failure for writes.
- Conflict resolution happens at read time or in the background.
- Amazon DynamoDB, Apache Cassandra, and Riak follow this model.
Replication Lag and Its Consequences#
In asynchronous setups, replicas may fall behind the primary. This replication lag causes:
- Read-your-own-write inconsistency — a user writes data, then reads from a stale replica and does not see the change.
- Monotonic read violations — successive reads hit different replicas and appear to go backward in time.
- Causal ordering issues — a reply appears before the original message.
Mitigation strategies include sticky sessions (route a user to the same replica), read-after-write guarantees at the application layer, and causal consistency protocols.
Conflict Resolution#
When two nodes accept conflicting writes, the system must pick a winner or merge.
Last-Write-Wins (LWW)#
Attach a timestamp to each write; the latest timestamp wins. Simple but can silently drop concurrent updates. Clock skew makes this unreliable without synchronized clocks (NTP or hybrid logical clocks).
Vector Clocks#
Track a version counter per node. When two versions are not ordered by the vector clock, the system detects a conflict and can surface it to the application for manual resolution. Amazon's original Dynamo paper popularized this approach.
CRDTs (Conflict-Free Replicated Data Types)#
Data structures mathematically guaranteed to converge without coordination. Examples: G-Counters, OR-Sets, LWW-Registers. CRDTs eliminate conflict resolution logic at the cost of restricting the data model. Redis CRDTs and Automerge use this approach.
Application-Level Merge#
Let the application define merge logic — for example, union two shopping carts or concatenate two text edits. This is the most flexible but requires careful domain-specific code.
Tools and Implementations#
PostgreSQL Streaming Replication#
PostgreSQL ships WAL (Write-Ahead Log) records to standbys over a TCP connection. Supports synchronous and asynchronous modes. Tools like Patroni automate failover and leader election.
# postgresql.conf on primary
wal_level = replica
max_wal_senders = 5
synchronous_standby_names = 'replica1'
MySQL Group Replication#
A plugin that implements virtually synchronous multi-leader replication with built-in conflict detection. Certified writes are applied in the same order on all members.
MongoDB Replica Sets#
MongoDB uses a single-leader replica set with automatic failover. The driver is replica-set-aware and redirects writes to the current primary. Read preference settings let you route reads to secondaries.
const client = new MongoClient(uri, {
readPreference: "secondaryPreferred",
});
Read Replicas for Scaling#
Adding read replicas is the most common first step when your database bottleneck is read throughput. Route analytics queries, dashboard loads, and search indexing to replicas while the primary handles writes.
Key considerations:
- Connection routing — use a load balancer, DNS, or application-level routing.
- Acceptable staleness — decide how much lag your reads can tolerate.
- Replica count — more replicas increase read throughput but also replication fan-out cost.
Choosing a Strategy#
| Requirement | Recommended Topology |
|---|---|
| Simple HA for a single-region app | Single-leader with async replication |
| Zero data loss on failover | Single-leader with sync replication |
| Multi-region low-latency writes | Multi-leader or leaderless |
| Extreme read scalability | Single-leader + many read replicas |
Start with the simplest topology that meets your availability and latency requirements. Add complexity only when measurements demand it.
Conclusion#
Database replication is foundational to building reliable, scalable systems. Understand the trade-offs between synchronous and asynchronous modes, pick the right topology for your access patterns, and plan your conflict resolution strategy before you need it — not after your first data loss incident.
At codelit.io we build tools that help engineering teams ship with confidence. Try Codelit today.
This is article #159 in our engineering blog series.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Database Replication Strategies in seconds.
Try it in Codelit →
Comments