Design a URL Shortener (Bitly)
An end-to-end interview-ready walkthrough — from back-of-envelope math through deep dives on ID generation, caching, analytics, scaling, and abuse. Structured to mirror the arc of a 45-minute system design interview.
Requirements
Before touching a whiteboard, anchor the problem. A URL shortener is deceptively simple — the scope decides whether you build a single service on one box or a multi-region system with a dedicated ID generator. Separate what the system does from how well it must do it.
Functional Requirements
Core business logic & features
- 01.URL ShorteningGiven a long URL, generate a shorter and unique alias of it.
- 02.RedirectionWhen users access a short link, redirect them to the original long URL.
- 03.Custom AliasesUsers should optionally be able to pick a custom alias for their URL.
- 04.Link ExpirationLinks expire after a default timespan. Users can specify custom expiration.
- 05.Click AnalyticsTrack click count, referrer, geo, and device for each short URL.
- 06.Link ManagementAuthenticated users can list, update, and delete their short URLs.
Non-Functional
System constraints
Availability
99.99% uptime — downtime breaks every live link on the internet.
Latency
Redirect in <10ms p99. The redirect is the product experience.
Scale
100M writes/day, 1B reads/day. 10:1 read-to-write ratio.
Unpredictability
Short codes must be non-guessable to resist enumeration attacks.
🎯 Clarifying questions worth asking
These aren't filler. Each one changes the design:
- Is the same long URL shortened to the same code? (dedupe vs every-request-unique — affects write path)
- Do codes expire or live forever? (TTL cleanup job, storage growth)
- How real-time do analytics need to be? (sync counter vs async pipeline)
- Global or single-region? (multi-region introduces ID-gen coordination)
- Are links public? (scanning for phishing/malware becomes required)
In scope vs out of scope
| In Scope | Out of Scope | Why |
|---|---|---|
| Create + redirect | Link preview / title scraping | Nice-to-have, not core to the redirect SLA |
| Click analytics (async) | Real-time dashboards with sub-second freshness | Separate system — streaming OLAP is its own problem |
| TTL-based expiration | Content moderation appeals flow | Product concern, not a distributed-systems one |
| Custom aliases | Branded domains per customer (rebrandly.com) | Enterprise feature, adds DNS + cert management |
| Basic abuse protection | ML-based phishing detection | Usually offloaded to Google Safe Browsing |
💡 Interviewer signal
Candidates who jump straight to "use Redis" lose the first five minutes. Stating constraints explicitly — "redirect is the hot path, analytics can be async, codes are write-once" — sets the frame for every decision that follows.