Design a Messaging App (WhatsApp / Telegram)
An end-to-end interview-ready walkthrough — from back-of-envelope math through deep dives on WebSocket management, message ordering, group fan-out, E2E encryption, and multi-device sync. Structured to mirror the arc of a 45-minute system design interview.
Requirements
A messaging app is deceptively complex. On the surface it's "send text from A to B" — but at WhatsApp/Telegram scale, you're solving real-time delivery across 2 billion devices, message ordering without a global clock, end-to-end encryption that even your own servers can't break, and group fan-out to millions of members. The requirements you anchor here determine whether you build a weekend project or a planetary-scale communication system.
Functional Requirements
Core business logic & features
- 01.1:1 MessagingUsers can send text messages to any other user. Messages are persisted and available on reconnect.
- 02.Group MessagingUsers can create groups (up to 100K members for Telegram-scale). Messages fan out to all participants.
- 03.Delivery StatusThree-state delivery tracking: sent (server received), delivered (recipient device received), read (recipient opened).
- 04.Media SharingSupport images, videos, audio messages, and documents up to 2GB. Thumbnails generated server-side.
- 05.Online PresenceShow online/offline status and 'last seen' timestamp. Typing indicators for active conversations.
- 06.Multi-Device SyncUsers can be logged in on phone + desktop + tablet simultaneously. All devices stay in sync.
Non-Functional
System constraints
Latency
Message delivery in <500ms end-to-end for online recipients. Typing indicators in <200ms.
Scale
2B registered users, 500M DAU, 100B messages/day. Peak: 50M concurrent WebSocket connections.
Availability
99.99% uptime. Messaging is critical infrastructure — downtime means people can't communicate.
Security
End-to-end encryption for all 1:1 messages. Even server operators cannot read message content.
🎯 Clarifying questions that change the design
Each of these steers you toward a fundamentally different architecture:
- What's the max group size? 256 members (WhatsApp) vs 200K (Telegram) changes fan-out strategy entirely. Small groups can fan-out on write; large groups must fan-out on read.
- Is message history stored server-side or client-side? WhatsApp stores minimally on server (E2E encrypted, client is source of truth). Telegram stores everything server-side (cloud-first). This changes your storage model.
- Multi-device or single-device? Single-device (original WhatsApp) is simpler — one inbox queue. Multi-device requires per-device delivery tracking and sync protocols.
- Do we need message search? Searching E2E encrypted messages requires client-side indexing. Server-side search only works for unencrypted messages.
- Voice/video calls? Real-time media is a separate system (WebRTC, TURN/STUN servers). Scope it out unless asked.
- Message retention policy? Keep forever vs auto-delete after N days changes storage sizing dramatically.
In scope vs out of scope
| In Scope | Out of Scope | Why |
|---|---|---|
| 1:1 and group text messaging | Voice/video calls (WebRTC) | Real-time media is a separate system with different latency models |
| Media sharing (images, video, docs) | Stories / status updates | Ephemeral content is a feed problem, not a messaging one |
| Delivery receipts (sent/delivered/read) | Payment integration (WhatsApp Pay) | Fintech is its own 45-minute interview |
| End-to-end encryption (1:1) | Full E2E for large groups (>256) | Group E2E at scale requires complex key rotation — mention but don't deep-dive |
| Online presence + typing indicators | AI chatbots / message translation | ML features, not distributed systems |
| Multi-device sync (phone + desktop) | Cross-platform message backup/restore | Backup is a storage/export concern, not real-time delivery |
| Push notifications for offline users | SMS fallback delivery | Carrier integration is a vendor concern, not architecture |
💡 Interviewer signal
The strongest opening: "I'll focus on the real-time message delivery pipeline — that's where the distributed systems complexity lives. The core challenge is maintaining message ordering, exactly-once delivery semantics, and sub-500ms latency across 2 billion devices with persistent connections. Media upload is an async pipeline I'll cover separately." This shows you know where the hard problems are.