Collaborative Editing System Design: Real-Time Co-Authoring at Scale
Collaborative Editing System Design#
Real-time collaborative editing is one of the hardest distributed systems problems disguised as a simple product feature. When two people type in the same document simultaneously, the system must resolve conflicts, maintain consistency, and feel instant — all at once.
Why It's Hard#
Single-user editing is trivial: one writer, one document, no conflicts. Add a second user and everything changes:
- Concurrency — two users edit the same paragraph at the same time
- Latency — network round-trips mean each user sees a stale version
- Ordering — operations arrive in different orders at different clients
- Intent preservation — the merged result should reflect what both users meant
The core challenge: eventual consistency without losing anyone's work.
High-Level Architecture#
┌─────────┐ WebSocket ┌──────────────┐ ┌───────────┐
│ Client A │◄──────────────────►│ Sync Server │◄───►│ Database │
└─────────┘ │ (Stateful) │ │ (Versions)│
┌─────────┐ WebSocket │ │ └───────────┘
│ Client B │◄──────────────────►│ │
└─────────┘ └──────────────┘
Key components:
- Rich text editor — renders the document and captures user operations
- Sync server — receives operations, resolves ordering, broadcasts to peers
- Persistence layer — stores document snapshots and operation history
- Presence service — tracks cursors, selections, and online users
OT vs CRDT: The Two Approaches#
Operational Transformation (OT)#
OT was pioneered by Google Docs. Each edit is an operation (insert, delete, retain). When concurrent operations arrive, the server transforms one against the other so both can be applied in sequence.
User A: insert("X", pos=3)
User B: delete(pos=1)
Server transforms A's op against B's:
B deleted before pos 3, so A's insert shifts to pos 2
Result: insert("X", pos=2)
Strengths:
- Battle-tested (Google Docs, Etherpad)
- Smaller payloads per operation
- Centralized server simplifies conflict resolution
Weaknesses:
- Server is a single point of ordering — hard to decentralize
- Transform functions grow complex with rich-text formatting
- N-way transformation is notoriously error-prone
Conflict-Free Replicated Data Types (CRDT)#
CRDTs assign each character a unique, ordered ID so operations commute naturally — no transformation needed. Libraries like Yjs and Automerge implement this.
User A's "X" gets ID (A, seq=7) between IDs (_, seq=5) and (_, seq=6)
User B's "Y" gets ID (B, seq=4) between the same IDs
→ Deterministic ordering by ID resolves the conflict automatically
Strengths:
- No central server required — works peer-to-peer
- Mathematically guaranteed convergence
- Naturally supports offline editing
Weaknesses:
- Higher memory overhead (tombstones for deleted characters)
- Document size grows over time without garbage collection
- Debugging merged states is harder
Which to Choose?#
| Factor | OT | CRDT |
|---|---|---|
| Central server available | Yes | Optional |
| Offline-first required | Difficult | Natural fit |
| Rich-text complexity | Manageable | Growing ecosystem |
| Proven at Google scale | Yes | Figma uses a CRDT variant |
For most new projects, Yjs (CRDT) offers the best developer experience with solid performance.
Conflict Resolution in Practice#
True conflicts are rarer than you might think. Most edits happen in different parts of the document. When they do collide:
- Same position insert — deterministic tie-breaking by user ID
- Concurrent delete and edit — delete wins (the text no longer exists)
- Formatting conflicts — last-writer-wins per attribute (bold, italic, etc.)
- Structural conflicts — e.g., two users moving the same block. Requires application-level policies
The key insight: conflict resolution must be deterministic and identical on every client.
Cursor Presence and Awareness#
Users expect to see collaborators' cursors, selections, and names in real-time.
Implementation:
- Each client broadcasts cursor position on every change
- Presence data is ephemeral — stored in memory, not persisted
- Updates are throttled (every 50-100ms) to avoid flooding
- Cursor positions reference document-relative IDs, not character offsets, so they survive concurrent edits
Awareness features beyond cursors:
- User avatars in the toolbar
- "User X is viewing Section 3" indicators
- Typing indicators per paragraph
WebSocket Sync Protocol#
HTTP polling is too slow for real-time editing. WebSockets provide the persistent, bidirectional channel needed.
A typical sync protocol:
1. Client connects → sends auth token + document ID
2. Server sends current document state (snapshot)
3. Client applies snapshot, enters "synced" state
4. On local edit:
a. Apply operation locally (optimistic)
b. Send operation to server
c. Server assigns sequence number
d. Server broadcasts to all other clients
5. On receiving remote operation:
a. Transform against pending local operations (OT)
— or merge via CRDT
b. Apply to local document
Latency targets: operations should appear on remote clients in under 100ms on the same region, under 300ms cross-region.
Version History#
Users need to browse, compare, and restore previous versions.
Design considerations:
- Store operation log — replay operations to reconstruct any point in time
- Periodic snapshots — avoid replaying thousands of operations from the beginning
- Snapshot every N operations or every M minutes
- Named versions — let users manually save checkpoints ("Draft v2")
- Diff view — highlight what changed between two versions using the operation log
Storage strategy:
Operations table: doc_id | seq | user_id | op_data | timestamp
Snapshots table: doc_id | seq | snapshot_blob | timestamp
Compact old operations by merging them into snapshots after 30 days.
Permission Model#
Collaborative documents need fine-grained access control:
- Owner — full control, can delete document
- Editor — can edit content
- Commenter — can add comments and suggestions
- Viewer — read-only access
Implementation patterns:
- Store permissions per document in a separate ACL table
- Check permissions on WebSocket connect and on every operation
- Share links with embedded tokens for frictionless access
- Organization-level defaults ("everyone at Acme Corp can edit")
For suggestion mode (like Google Docs "Suggesting"):
- Track suggested edits as pending operations tied to a user
- Owner or editor accepts/rejects each suggestion
- Rejected suggestions are discarded; accepted ones become real operations
Offline Editing#
Offline support is where CRDTs shine. The approach:
- Cache the document locally (IndexedDB or SQLite)
- Queue operations while offline
- On reconnect, sync queued operations with the server
- CRDT properties guarantee convergence without special handling
For OT-based systems, offline is harder:
- Queued operations must be transformed against all server operations that happened while offline
- Long offline periods create large transformation chains
- Risk of surprising merges increases with time apart
Scaling Considerations#
- Horizontal scaling — shard by document ID. Each document lives on one sync server at a time
- Server affinity — use consistent hashing to route WebSocket connections for the same document to the same server
- Large documents — split into blocks/pages that sync independently
- Hot documents — a document with 100 concurrent editors needs dedicated capacity. Monitor and auto-scale
- Storage — operation logs grow fast. Compress, compact, and archive aggressively
Technology Choices#
| Component | Options |
|---|---|
| CRDT library | Yjs, Automerge, Diamond Types |
| Editor framework | ProseMirror, TipTap, Slate, Lexical |
| Transport | WebSocket, WebRTC (peer-to-peer) |
| Persistence | PostgreSQL, Redis (presence), S3 (snapshots) |
| Auth | JWT tokens validated on WebSocket handshake |
Key Takeaways#
- OT requires a central server; CRDTs do not — choose based on your architecture
- Yjs + TipTap is the most practical stack for new collaborative editors
- Presence is separate from document sync — treat it as ephemeral, high-frequency data
- Version history = operation log + periodic snapshots
- Offline editing is natural with CRDTs, painful with OT
- Shard by document ID for horizontal scaling
Ready to design systems like this in interviews and on the job? codelit.io gives you the tools to practice system design interactively — from collaborative editors to distributed databases.
This is article #211 in the Codelit engineering blog series.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsReal-Time Collaborative Editor
Notion-like document editor with real-time collaboration, conflict resolution, and rich media.
9 componentsNetflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsBuild this architecture
Generate an interactive architecture for Collaborative Editing System Design in seconds.
Try it in Codelit →
Comments