Delivery
Serve large files at scale — CDN delivery with presigned URLs, transcoding pipelines for media processing, and deduplication with content hashing for storage optimization.
Table of Contents
The Big Picture — Why Serving Large Files Is Hard
Serving a 5KB JSON API response is trivial. Serving a 500MB video to 10 million users worldwide is an entirely different problem. The bandwidth alone would crush your origin servers, the latency for distant users would be unacceptable, and storing duplicate copies of the same file wastes petabytes of storage.
The Factory vs Local Warehouses
Imagine a factory in Virginia that makes products. If every customer worldwide had to drive to Virginia to pick up their order, the factory parking lot would be gridlocked and customers in Tokyo would wait weeks. The solution: stock popular products in local warehouses (CDN edges) near major cities. Customers pick up from the nearest warehouse — fast, no factory congestion. The factory only ships to warehouses, not to individual customers. Presigned URLs are like pickup tickets — they prove you're authorized to collect your order without the factory verifying each person.
Bandwidth
A 100MB video × 1M downloads = 100 TB of bandwidth. Your origin server can't handle that. CDN edges absorb 99% of this traffic.
Latency
A user in Tokyo downloading from Virginia: 150ms round trip × hundreds of packets = seconds of delay. A CDN edge in Tokyo: 10ms. 15x faster.
Storage Waste
100 users upload the same meme. Without deduplication, you store 100 copies. With content hashing, you store 1 copy and 100 references.
🔥 Key Insight
The three pillars of large file delivery: CDN (serve from the edge, not the origin), presigned URLs (bypass the backend for file transfer), and deduplication (store each unique file exactly once). Every file-heavy system — YouTube, Dropbox, Instagram — uses all three.
Delivery Architecture
Upload
Client → Object Storage
Process
Queue → Workers
Store
Multiple variants
Deliver
CDN → Client
UPLOAD PATH (write): Client → Presigned Upload URL → Object Storage (S3) → Backend never touches the file bytes → S3 triggers event → Processing Queue PROCESSING PATH (async): Queue → Worker picks up job → Download original from S3 → Transcode: 1080p, 720p, 480p, thumbnail → Upload variants back to S3 → Update metadata DB: "video_123 ready, 3 variants" DELIVERY PATH (read): Client requests video_123 → Backend generates presigned CDN URL (720p variant) → Client fetches directly from CDN edge → CDN cache HIT? → serve from edge (10ms) → CDN cache MISS? → fetch from S3 origin, cache, serve Key insight: the backend NEVER serves file bytes. Upload: client → S3 directly (presigned upload URL) Download: client → CDN directly (presigned download URL) Backend only handles: auth, metadata, URL generation
✅ Backend Handles
- Authentication and authorization
- Generating presigned URLs (upload + download)
- Metadata management (file info, variants, status)
- Triggering processing pipelines
- Deduplication checks (hash lookup)
❌ Backend Does NOT Handle
- File upload bytes (client → S3 directly)
- File download bytes (client → CDN directly)
- Transcoding (async workers handle this)
- Serving static assets (CDN handles this)
CDN + Presigned URLs
A presigned URL is a temporary, authenticated URL that grants access to a specific object in storage without requiring the client to have storage credentials. The backend generates the URL (signed with its credentials), and the client uses it to upload or download directly from storage or CDN.
DOWNLOAD FLOW: 1. Client: GET /api/videos/123 2. Backend: → Verify auth (is this user allowed to access video 123?) → Generate presigned URL: https://cdn.example.com/videos/123/720p.mp4 ?X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIA.../20250115/us-east-1/s3/aws4_request &X-Amz-Date=20250115T120000Z &X-Amz-Expires=3600 ← valid for 1 hour &X-Amz-Signature=abc123... ← cryptographic signature → Return URL to client 3. Client: fetches video directly from CDN using the presigned URL → CDN edge has it cached? → serve immediately (10ms) → CDN cache miss? → CDN fetches from S3, caches, serves Backend involvement: ~5ms (auth + URL generation) File transfer: 0 bytes through backend ✅ UPLOAD FLOW: 1. Client: POST /api/uploads (request upload URL) 2. Backend: → Generate presigned upload URL for S3 → Return URL + upload instructions 3. Client: PUT file directly to S3 using presigned URL → 500MB video goes straight to S3, not through backend 4. S3 triggers event → processing pipeline starts
CDN Caching for Files
| Content Type | Cache Strategy | TTL | Invalidation |
|---|---|---|---|
| Images (profile pics) | Cache aggressively | 30 days | New URL on change (content hash in filename) |
| Videos (uploaded content) | Cache aggressively | 1 year | Immutable — URL includes version/hash |
| Thumbnails | Cache aggressively | 30 days | Regenerate with new URL on change |
| User-specific files | Don't cache on shared CDN | N/A | Use presigned URLs with short expiry |
| Public assets (CSS/JS) | Cache with versioning | 1 year | Filename includes content hash (app.a1b2c3.js) |
Benefits
- ✅Backend serves 0 file bytes — only metadata and URLs
- ✅CDN absorbs 99%+ of download traffic
- ✅Presigned URLs provide time-limited, secure access
- ✅Upload goes directly to S3 — no backend bottleneck
- ✅Global delivery via CDN edges (low latency worldwide)
Considerations
- ❌Presigned URLs can be shared (anyone with the URL can access)
- ❌Cache invalidation is complex (use content-hash URLs instead)
- ❌URL expiration must be tuned (too short = broken links, too long = security risk)
- ❌CDN costs scale with bandwidth (can be significant for video)
- ❌Private content needs signed cookies or short-lived URLs
🎯 Interview Insight
Presigned URLs are the standard answer for "how do you handle file uploads/downloads at scale?" Say: "The client uploads directly to S3 via a presigned URL — the backend never touches the file bytes. For downloads, the backend generates a presigned CDN URL. This means the backend handles only auth and metadata, while S3 and CDN handle all file transfer."
Transcoding Pipelines
Transcoding converts uploaded files into multiple formats and resolutions optimized for different devices and network conditions. A 4K video uploaded from a phone needs to be available as 1080p, 720p, 480p, and thumbnail — each encoded for efficient streaming.
Upload event triggers pipeline: S3 Event: "video_123.mp4 uploaded" │ ▼ Message Queue (SQS / Kafka) │ ▼ Transcoding Workers (auto-scaling) │ ├→ Worker 1: video_123 → 1080p.mp4 (H.264, 5 Mbps) ├→ Worker 2: video_123 → 720p.mp4 (H.264, 2.5 Mbps) ├→ Worker 3: video_123 → 480p.mp4 (H.264, 1 Mbps) ├→ Worker 4: video_123 → thumbnail.jpg (frame at 2s) └→ Worker 5: video_123 → HLS manifest (adaptive streaming) │ ▼ Upload variants to S3: s3://videos/123/1080p.mp4 s3://videos/123/720p.mp4 s3://videos/123/480p.mp4 s3://videos/123/thumbnail.jpg s3://videos/123/manifest.m3u8 │ ▼ Update metadata DB: video_123: status = "ready", variants = [1080p, 720p, 480p] Key principles: → Always async (never block the upload response) → Queue decouples upload from processing → Workers auto-scale based on queue depth → Each variant is an independent job (parallelizable)
Adaptive Streaming
Instead of downloading a single file, adaptive streaming (HLS / DASH) splits the video into small segments (2-10 seconds each) at multiple quality levels. The player dynamically switches quality based on the user's bandwidth — seamless quality adjustment without buffering.
Queue (Decoupling)
SQS, Kafka, or RabbitMQ sits between the upload event and workers. If workers are busy, jobs queue up instead of being dropped. Workers process at their own pace.
Workers (Processing)
Stateless containers (ECS, Kubernetes) running FFmpeg or similar. Auto-scale based on queue depth: 100 pending jobs → spin up 10 workers. 0 jobs → scale to 0.
Storage (Output)
Each variant is stored as a separate object in S3. The manifest file (HLS .m3u8) lists all available qualities. CDN caches each segment independently.
Benefits
- ✅Device compatibility (4K TV, phone, slow connection)
- ✅Bandwidth optimization (serve 480p on 3G, 1080p on WiFi)
- ✅Async processing — upload response is instant
- ✅Parallelizable — each variant is an independent job
- ✅Cost-efficient — process once, serve millions of times
Trade-offs
- ❌Processing cost (transcoding is CPU-intensive)
- ❌Latency — video isn't available until transcoding completes
- ❌Storage multiplication (3-5 variants per video)
- ❌Complexity — pipeline monitoring, failure handling, retries
- ❌Not real-time — minutes to hours for long videos
🎯 Interview Insight
Transcoding is always async. Say: "After upload, an event triggers a message queue. Workers pick up jobs and transcode into multiple resolutions. The user sees a 'processing' state until all variants are ready. Workers auto-scale based on queue depth. This decouples upload latency from processing time."
Deduplication with Hashing
Deduplication ensures that identical files are stored only once, regardless of how many users upload them. The key insight: if two files have the same content, they produce the same hash. Store the file once, reference it by hash.
User A uploads: cat_meme.jpg (2.3 MB) 1. Compute hash: SHA-256(file_bytes) = "a1b2c3d4e5f6..." 2. Check storage: does object "a1b2c3d4e5f6..." exist? → NO: upload file to S3 as "a1b2c3d4e5f6..." → Store metadata: { user: A, filename: "cat_meme.jpg", hash: "a1b2c3..." } User B uploads: funny_cat.jpg (same image, different filename) 1. Compute hash: SHA-256(file_bytes) = "a1b2c3d4e5f6..." (same!) 2. Check storage: does object "a1b2c3d4e5f6..." exist? → YES: skip upload (file already stored) → Store metadata: { user: B, filename: "funny_cat.jpg", hash: "a1b2c3..." } Result: Storage: 1 copy of the file (2.3 MB, not 4.6 MB) Metadata: 2 entries pointing to the same hash Savings: 50% storage reduction for this file At scale (Dropbox, Google Drive): Millions of users upload the same popular files Deduplication saves 30-60% of total storage That's petabytes of savings = millions of dollars
Implementation Approaches
| Approach | How It Works | Dedup Level | Best For |
|---|---|---|---|
| Whole-file hashing | Hash entire file, compare hash | File-level | Simple, effective for exact duplicates |
| Chunk-level hashing | Split file into chunks, hash each chunk | Chunk-level | Partial duplicates (edited files share most chunks) |
| Client-side hashing | Client computes hash before upload, server checks | File-level | Saves bandwidth (don't upload if already exists) |
Without client-side dedup: Client uploads 500MB video → server hashes → duplicate found → 500MB wasted bandwidth, upload took 2 minutes for nothing With client-side dedup: 1. Client computes: SHA-256(file) = "a1b2c3..." 2. Client asks server: POST /api/uploads/check { hash: "a1b2c3..." } 3. Server checks: hash exists? → YES 4. Server responds: { "exists": true, "file_id": "file_789" } 5. Client skips upload entirely — just links to existing file → 0 bytes uploaded, instant "upload complete" Dropbox uses this: "instant upload" for files that already exist in any user's storage. The file never leaves the client's machine.
Benefits
- ✅30-60% storage savings at scale
- ✅Reduced bandwidth (client-side dedup skips upload)
- ✅Content-addressable: hash IS the address (immutable, cacheable)
- ✅Natural integrity verification (hash = checksum)
- ✅Simplifies CDN caching (same content = same URL forever)
Trade-offs
- ❌Hash computation cost (SHA-256 on large files takes time)
- ❌Hash collision risk (astronomically rare with SHA-256, but non-zero)
- ❌Deletion complexity (can't delete a file if other users reference it)
- ❌Reference counting needed (track how many users point to each hash)
- ❌Privacy concern: knowing a hash exists reveals the file exists somewhere
🎯 Interview Insight
Deduplication is the answer to "how do you optimize storage for a file sharing system?" Say: "I'd use content-addressable storage — files are stored by their SHA-256 hash. Before uploading, the client sends the hash to check if the file already exists. If yes, we skip the upload entirely and just create a metadata reference. This saves 30-60% of storage at scale."
End-to-End Scenario
Let's design the file delivery system for a video sharing platform using all three patterns.
UPLOAD: 1. Client computes SHA-256 of video file 2. Client: POST /api/uploads { hash: "a1b2c3...", size: 524MB, type: "video/mp4" } 3. Server checks dedup: hash exists? → YES: skip upload, return existing file_id (instant!) → NO: generate presigned S3 upload URL 4. Client uploads directly to S3 via presigned URL → 524MB goes to S3, not through backend 5. S3 triggers event → message sent to transcoding queue PROCESSING (async): 6. Transcoding worker picks up job 7. Downloads original from S3 8. Transcodes: 1080p, 720p, 480p, thumbnail, HLS manifest 9. Uploads all variants to S3: s3://videos/{hash}/1080p.mp4 s3://videos/{hash}/720p.mp4 s3://videos/{hash}/480p.mp4 s3://videos/{hash}/thumb.jpg s3://videos/{hash}/manifest.m3u8 10. Updates DB: video status = "ready" DELIVERY: 11. Client: GET /api/videos/456 12. Backend: auth check → generate presigned CDN URL for 720p 13. Client fetches from CDN: → CDN edge in user's region has it cached? → 10ms ✅ → Cache miss? → CDN fetches from S3, caches, serves → 200ms 14. Video player uses HLS manifest for adaptive streaming → Switches between 480p/720p/1080p based on bandwidth DEDUPLICATION IN ACTION: User B uploads the same video (different title): → SHA-256 matches → skip upload entirely → Create new metadata entry pointing to same hash → Storage: 1 copy serves both users
💡 This Is How YouTube / Instagram Works
Upload directly to object storage (presigned URL), async transcoding pipeline (queue + workers), content-addressable deduplication (hash-based), CDN delivery (presigned URLs to edge). The backend never touches file bytes — it's purely a metadata and orchestration layer.
Trade-offs & Decision Making
| Decision | Option A | Option B | Choose A When | Choose B When |
|---|---|---|---|---|
| Delivery method | CDN + presigned URLs | Serve through backend | Always (at any meaningful scale) | Never (backend becomes bottleneck) |
| Processing timing | Pre-transcode all variants | On-demand transcoding | Popular content, predictable formats | Long-tail content, many format combinations |
| Deduplication | Content-hash dedup | Store every upload separately | Many users upload similar content (social, file sharing) | All content is unique (user-generated documents) |
| Hash computation | Client-side hashing | Server-side hashing | Save bandwidth (skip duplicate uploads) | Don't trust client (security-sensitive) |
💰 Cost Considerations
- CDN bandwidth: $0.02-0.08/GB (cheaper than origin)
- S3 storage: $0.023/GB/month
- Transcoding: $0.015-0.030 per minute of video
- Dedup savings: 30-60% of storage costs
- Pre-transcoding: higher upfront cost, lower serving cost
⚡ Performance Considerations
- CDN cache hit: ~10ms (edge), miss: ~200ms (origin)
- Presigned URL generation: ~5ms (backend)
- Transcoding: minutes to hours (async, not user-facing)
- Client-side hash: seconds (runs in browser/app)
- Dedup check: ~1ms (hash lookup in DB/Redis)
Interview Questions
Q:Why use presigned URLs instead of serving files through the backend?
A: If the backend serves file bytes, every download consumes backend CPU, memory, and bandwidth. A 100MB video × 10K concurrent downloads = 1 TB of bandwidth through your backend servers. With presigned URLs, the backend generates a signed URL (~5ms, ~1KB response) and the client downloads directly from S3/CDN. The backend handles 0 bytes of file transfer. This is the difference between needing 2 backend servers and needing 200. Every file-heavy system (YouTube, Dropbox, Instagram) uses this pattern.
Q:How does a CDN improve file delivery performance?
A: A CDN caches files at edge servers worldwide. A user in Tokyo gets the file from a Tokyo edge (~10ms) instead of a Virginia origin (~150ms). Benefits: (1) Lower latency — files served from the nearest edge. (2) Reduced origin load — CDN absorbs 99%+ of download traffic. (3) Higher throughput — CDN has massive bandwidth capacity. (4) Reliability — if one edge is down, traffic routes to the next nearest. For large files, the CDN also handles range requests (resume interrupted downloads) and adaptive bitrate streaming.
Q:How do you design a video processing pipeline?
A: Always async. (1) Upload triggers an event (S3 notification). (2) Event goes to a message queue (SQS/Kafka). (3) Transcoding workers pick up jobs and produce multiple variants (1080p, 720p, 480p, thumbnail, HLS manifest). (4) Variants are stored in S3. (5) Metadata DB is updated with status='ready'. Workers auto-scale based on queue depth. Each variant is an independent job — parallelizable. The user sees 'processing' until all variants are ready. Never transcode synchronously — a 1-hour video takes minutes to transcode.
You're designing a file sharing system like Dropbox for 100M users
How do you handle storage efficiency?
Answer: Content-addressable deduplication. (1) Client computes SHA-256 of the file before upload. (2) Client sends hash to server: 'Does this file exist?' (3) If yes → skip upload, create metadata reference to existing file. Instant 'upload'. (4) If no → client uploads via presigned URL, file stored by hash. At 100M users, many files are duplicated (same documents, memes, videos). Dedup saves 30-60% of storage — at petabyte scale, that's millions of dollars. Chunk-level dedup (splitting files into 4MB chunks and deduplicating chunks) catches partial duplicates too — an edited document shares 95% of chunks with the original.
Common Pitfalls
Serving files through the backend
The backend reads the file from storage and streams it to the client. Every download consumes backend CPU, memory, network bandwidth, and a thread/connection. At 1,000 concurrent downloads of 100MB files, the backend needs 100GB of bandwidth and hundreds of threads. It becomes the bottleneck and crashes.
✅Never serve file bytes through the backend. Generate presigned URLs and let the client download directly from S3 or CDN. The backend handles only auth and URL generation (~5ms, ~1KB). This is non-negotiable for any file-heavy system.
Not using a CDN
All downloads come from the origin server in one region. Users in Asia get 150ms+ latency for every request. The origin server's bandwidth is saturated. Adding more origin servers doesn't help — the latency is physical (speed of light).
✅Put a CDN in front of your object storage. CDN edges cache popular files worldwide. A user in Tokyo gets the file from a Tokyo edge in 10ms instead of 150ms from Virginia. CDN bandwidth is also cheaper than origin bandwidth. For video, CDN is not optional — it's required.
Synchronous processing of large files
The upload API endpoint transcodes the video before returning a response. A 1-hour video takes 10 minutes to transcode. The HTTP request times out at 30 seconds. The user sees an error. Even if it didn't timeout, the user waits 10 minutes staring at a spinner.
✅Always process large files asynchronously. Upload → return 202 Accepted immediately → trigger processing via queue → workers transcode in the background → update status when done. The client polls for status or receives a webhook/push notification when processing completes.
Ignoring cache invalidation
A user updates their profile picture. The old image is cached on 200 CDN edges worldwide. The new image is in S3 but CDN keeps serving the old one for hours (until TTL expires). The user sees their old photo and thinks the upload failed.
✅Use content-hash URLs: instead of /images/user_42/avatar.jpg, use /images/user_42/avatar_a1b2c3.jpg where a1b2c3 is the content hash. When the image changes, the URL changes — CDN fetches the new file automatically. No cache invalidation needed. This is why every modern build tool puts content hashes in filenames.