Design a Collaborative Document Editor (Google Docs / Notion)
An end-to-end interview-ready walkthrough — from back-of-envelope math through deep dives on OT vs CRDTs, server-authoritative ordering, WebSocket synchronization, presence tracking, offline editing, and version history. Structured to mirror the arc of a 45-minute system design interview.
Requirements
A collaborative document editor is one of the hardest real-time systems to build correctly. The core challenge isn't rendering text — it's ensuring that when 20 people type simultaneously in the same paragraph, every user sees the same final document without any edits being lost or corrupted. This is a distributed consensus problem disguised as a text editor.
Functional Requirements
Core business logic & features
- 01.Real-Time Collaborative EditingMultiple users edit the same document simultaneously. Changes appear on all clients within 100-200ms.
- 02.Conflict-Free Concurrent EditsWhen two users type at the same position, both edits are preserved. No data loss, no corruption.
- 03.Live Cursor & PresenceSee other users' cursors, selections, and names in real-time. Know who's viewing the document.
- 04.Document VersioningFull revision history. View any past version. Restore to a previous state. See who changed what.
- 05.Comments & SuggestionsInline comments anchored to text ranges. Suggestion mode that proposes changes without applying them.
- 06.Offline EditingContinue editing without internet. Sync changes when connection is restored without conflicts.
Non-Functional
System constraints
Latency
Local edits appear instantly (0ms). Remote edits visible within 100-200ms over stable connection.
Consistency
All clients converge to the same document state. No edit is ever silently lost.
Scale
Support 50+ concurrent editors per document. Platform handles 10M+ active documents.
Durability
Zero data loss. Every keystroke is persisted. Document survives server crashes.
🎯 Clarifying questions worth asking
Each answer fundamentally changes the conflict resolution strategy:
- Rich text or plain text? (Rich text = tree structure; plain text = linear sequence. Tree conflicts are harder.)
- How many concurrent editors per document? (5 vs 50 vs 500 changes the fan-out and conflict frequency.)
- Is offline editing required? (Offline = client must buffer and rebase operations. Changes the entire sync model.)
- Block-based (Notion) or free-form (Google Docs)? (Block-based reduces conflict surface — edits within a block don't conflict with other blocks.)
- What's the maximum document size? (A 100-page doc with 50 editors has different perf characteristics than a short note.)
In scope vs out of scope
| In Scope | Out of Scope | Why |
|---|---|---|
| Real-time text collaboration | Spreadsheet / whiteboard collaboration | Different data models — spreadsheets are cell-based, whiteboards are spatial |
| Conflict resolution (OT/CRDT) | Conflict-free by design (locking) | Locking prevents collaboration — the whole point is concurrent editing |
| Presence & cursors | Video/audio conferencing | Separate real-time system with different latency requirements |
| Version history & restore | Git-style branching & merging | Document editors use linear history, not DAG-based version control |
| Comments & suggestions | Full workflow/approval system | Business logic layer — not a distributed systems challenge |
| Offline editing & sync | Peer-to-peer sync (no server) | P2P adds NAT traversal, discovery — separate problem entirely |
💡 Interviewer signal
The strongest opening: "This is fundamentally a distributed consensus problem. Multiple clients maintain local replicas of the document and must converge to the same state despite concurrent, potentially conflicting edits arriving in different orders. The core decision is whether to use OT (server-authoritative ordering) or CRDTs (mathematically guaranteed convergence without coordination)." This frames the entire interview.