Object StorageS3Chunked UploadResumable UploadContent-AddressableDeduplicationBlob Storage

Storage Patterns

Master large file storage — object storage (S3-style), chunked & resumable uploads, and content-addressable storage. Store and retrieve blobs efficiently at scale.

24 min read9 sections

The Big Picture — Why Blobs Are Different

A database row is a few hundred bytes. A photo is 2 MB. A video is 500 MB. Storing large binary files (blobs) in a database is like stuffing furniture into a filing cabinet — it technically fits, but it destroys performance, bloats backups, and wastes expensive storage. Large files need a different storage system designed for their access patterns.

🏭

The Warehouse vs Spreadsheet Analogy

A spreadsheet (database) is perfect for structured data: names, prices, dates — small, queryable, relational. A warehouse (object storage) is for large packages: boxes of inventory, pallets of goods. You wouldn't track inventory counts in a warehouse, and you wouldn't store pallets in a spreadsheet. Modern systems use both: the database stores metadata (file name, size, owner, URL) and the object store holds the actual file. The database is the catalog; the warehouse is the storage.

🔥 Key Insight

Never store large files in a relational database. Store the file in object storage (S3), store the metadata and URL in the database. This is the universal pattern — Instagram, YouTube, Dropbox, and every file-heavy system works this way.

Architecture Overview

File Storage Architecturetext

UPLOAD FLOW:
  Client → Upload Service → Object Storage (S3)
                          → Metadata DB (PostgreSQL)

  1. Client uploads file to Upload Service
  2. Upload Service stores file in S3 → gets back a key (file-id)
  3. Upload Service writes metadata to DB:
     { id: "file-123", key: "uploads/file-123.jpg",
       size: 2048576, type: "image/jpeg", owner: "user-42",
       url: "https://cdn.example.com/uploads/file-123.jpg" }

DOWNLOAD FLOW:
  Client → CDN → Object Storage (S3)

  1. Client requests file URL from API
  2. API returns CDN URL (or pre-signed S3 URL)
  3. Client downloads directly from CDN/S3
  → Your servers never touch the file bytes on download

KEY PRINCIPLE:
  Metadata (small, queryable) → Database
  File bytes (large, opaque)  → Object Storage
  Delivery (fast, global)     → CDN

📦

Object Storage

Stores the actual file bytes. Infinitely scalable, highly durable (99.999999999% — 11 nines). S3, GCS, Azure Blob.

🗄️

Metadata DB

Stores file info: name, size, type, owner, URL, timestamps. Small rows, queryable, relational. PostgreSQL, MySQL.

🌍

CDN

Caches and serves files from edge locations close to users. Reduces latency and offloads bandwidth from origin. CloudFront, Cloudflare.

Object Storage (S3-style)

Object storage treats each file as an immutable object with a unique key, binary data, and metadata. There are no folders, no file system hierarchy — just a flat key-value store optimized for large binary data.

Object Storage — Data Modeltext

Each object:
  {
    key:      "uploads/user-42/photo-abc123.jpg"   (unique identifier)
    data:     <2.1 MB of binary image data>        (the actual file)
    metadata: {
      content-type: "image/jpeg",
      size: 2148576,
      uploaded-at: "2024-06-15T10:30:00Z",
      custom-tags: { "user-id": "42", "album": "vacation" }
    }
  }

Key features:
  → Flat namespace: "uploads/user-42/photo.jpg" is a KEY, not a path
     (the "/" is just a character in the key, not a directory separator)
  → Immutable: objects are written once, read many times
     (to "update" a file, you write a new object with the same key)
  → Versioning: optionally keep all versions of an object
  → Lifecycle policies: auto-delete after 90 days, move to cold storage

Pre-Signed URLs — Direct Client Upload

Pre-Signed URL Flowtext

Without pre-signed URLs (bad):
  Client → Upload to your server → Server uploads to S3
  Problem: your server handles all file bytes → bandwidth bottleneck

With pre-signed URLs (good):
  1. Client asks your API: "I want to upload a photo"
  2. API generates a pre-signed S3 URL (valid for 15 minutes):
     https://bucket.s3.amazonaws.com/uploads/photo-abc.jpg
       ?X-Amz-Signature=abc123...&X-Amz-Expires=900
  3. Client uploads DIRECTLY to S3 using the pre-signed URL
  4. Client notifies your API: "Upload complete, key = photo-abc.jpg"
  5. API writes metadata to DB

Benefits:
  → Your servers never touch the file bytes
  → S3 handles the upload bandwidth (infinite capacity)
  → Pre-signed URL expires → no permanent public access
  → Works for downloads too (private files with temporary access)

Strengths

✅Infinite scalability (S3 handles any volume)
✅11 nines durability (99.999999999% — virtually no data loss)
✅Pay per GB stored + GB transferred (no provisioning)
✅Built-in versioning, lifecycle policies, encryption
✅Pre-signed URLs offload upload/download from your servers

Characteristics

❌Higher latency than local disk (~50-200ms first byte)
❌Not suitable for frequent updates (immutable objects)
❌No append operation (must rewrite the entire object)
❌Eventual consistency for overwrites (in some regions)
❌Egress costs can be significant at scale

🎯 Interview Insight

Whenever a system stores files — images, videos, documents, backups — say "object storage (S3)." Always mention pre-signed URLs for uploads: "The client uploads directly to S3 via a pre-signed URL. Our servers only handle the metadata. This eliminates the upload bandwidth bottleneck."

Chunked & Resumable Uploads

Uploading a 2 GB video as a single HTTP request is fragile. If the connection drops at 1.8 GB, you start over. Chunked uploads split the file into small pieces (5-100 MB each), upload them independently, and reassemble on the server. If a chunk fails, only that chunk is retried.

Chunked Upload — How It Workstext

File: vacation-video.mp4 (2 GB)
Chunk size: 100 MB → 20 chunks

Step 1 — Initiate upload:
  POST /uploads/initiate
  → Server returns upload_id: "upload-xyz-789"

Step 2 — Upload chunks (can be parallel):
  PUT /uploads/upload-xyz-789/chunk/1  → 100 MB → ✅
  PUT /uploads/upload-xyz-789/chunk/2  → 100 MB → ✅
  PUT /uploads/upload-xyz-789/chunk/3  → 100 MB → ❌ (network error)
  PUT /uploads/upload-xyz-789/chunk/3  → 100 MB → ✅ (retry only chunk 3)
  PUT /uploads/upload-xyz-789/chunk/4  → 100 MB → ✅
  ... (chunks 5-20 uploaded in parallel)

Step 3 — Complete upload:
  POST /uploads/upload-xyz-789/complete
  → Server reassembles chunks into final file
  → Stores in S3 as one object
  → Returns file URL

Resume after app crash:
  GET /uploads/upload-xyz-789/status
  → "Chunks 1-12 received, 13-20 missing"
  → Client resumes from chunk 13 (not from scratch)

Benefits

✅Resumable: network failure → retry only the failed chunk
✅Parallel uploads: multiple chunks upload simultaneously
✅Progress tracking: show upload percentage accurately
✅Large file support: no single-request size limits
✅S3 Multipart Upload: native support for this pattern

Complexity

❌Server must track chunk state (which chunks received)
❌Reassembly logic (combine chunks in correct order)
❌Cleanup needed for abandoned uploads (TTL on incomplete uploads)
❌Client must implement chunking and retry logic
❌More API endpoints (initiate, upload chunk, complete, status)

💡 S3 Multipart Upload

S3 has native multipart upload support. You don't need to build chunk reassembly yourself. Initiate a multipart upload, upload parts directly to S3 (with pre-signed URLs per part), then complete the upload. S3 assembles the parts into one object. This is how YouTube, Google Drive, and Dropbox handle large uploads.

Content-Addressable Storage

In content-addressable storage (CAS), the file's key is derived from its content — typically a cryptographic hash (SHA-256). The same file always produces the same hash, so identical files are stored only once. This enables automatic deduplication and integrity verification.

Content-Addressable Storage — How It Workstext

Upload flow:
  1. Client computes hash: SHA-256(file) → "a1b2c3d4e5f6..."
  2. Client asks server: "Do you have hash a1b2c3d4e5f6?"
  3a. Server: "YES" → skip upload (file already exists!) → instant ✅
  3b. Server: "NO"  → upload the file, store with key = hash

Storage:
  Key: "a1b2c3d4e5f6..."
  Value: <file bytes>

Deduplication:
  User A uploads vacation.jpg → hash = "abc123" → stored
  User B uploads the same photo → hash = "abc123" → already exists!
  → No duplicate storage. Both users reference the same object.
  → 1 copy stored, 2 metadata entries pointing to it.

Integrity verification:
  Download file → compute SHA-256 → compare with stored hash
  Match → file is intact ✅
  Mismatch → file is corrupted ❌ (re-download or alert)

📂

Git

Every commit, tree, and blob is stored by its SHA-1 hash. Same content = same hash = stored once. This is why Git repos are space-efficient despite storing full history.

💾

Backup Systems

Deduplication across backups. If 90% of files haven't changed since the last backup, only 10% of data is actually stored. Massive storage savings.

🌐

CDN & Caching

Cache keys based on content hash. When content changes, the hash changes, the URL changes — automatic cache busting. No stale content.

Strengths

✅Automatic deduplication (same content = same key)
✅Integrity verification (hash mismatch = corruption)
✅Immutable by design (content defines the key)
✅Cache-friendly (content hash = perfect cache key)
✅Space-efficient for systems with duplicate content

Trade-offs

❌Hash computation cost (SHA-256 on large files takes time)
❌Not suitable for mutable data (changing content = new hash = new key)
❌Deletion is complex (must check no other references exist)
❌Hash collisions are theoretically possible (practically impossible with SHA-256)
❌Reference counting needed to know when to delete an object

🎯 Interview Insight

Mention content-addressable storage when the system has significant duplicate content — file sharing (many users upload the same meme), backup systems (incremental backups), or build artifacts (same dependency stored once). Say: "I'd hash the file content and use the hash as the storage key. Before uploading, check if the hash exists — if yes, skip the upload and just add a reference. This deduplicates storage automatically."

End-to-End Scenario

Let's design the file storage layer for a Google Drive-like system — combining all three patterns.

📁 Cloud File Storage — 100M Users, 10B Files

Average file: 5 MB. Total storage: ~50 PB. 1M uploads/day.

Requirements: resumable uploads, deduplication, fast downloads.

Client computes content hash (deduplication check)

Before uploading, the client computes SHA-256 of the file. Sends the hash to the API: 'Do you have this file?' If yes → instant 'upload complete' (just add a reference in metadata DB). If no → proceed to upload. This deduplicates ~30% of uploads (shared files, duplicate photos).

Chunked upload via pre-signed URLs

API initiates an S3 multipart upload. Returns pre-signed URLs for each chunk (100 MB each). Client uploads chunks directly to S3 in parallel. If a chunk fails, client retries only that chunk. On completion, API tells S3 to assemble the parts.

Metadata stored in PostgreSQL

After upload: INSERT INTO files (id, name, content_hash, s3_key, size, owner_id, created_at). The content_hash enables deduplication. The s3_key points to the object in S3. Queries like 'list my files' hit the DB, not S3.

Downloads via CDN

When a user requests a file, the API generates a pre-signed CDN URL (or S3 URL). The client downloads directly from CDN/S3. Your servers never touch the file bytes on download. Popular files are cached at CDN edges worldwide.

Architecture — All Patterns Combinedtext

UPLOAD:
  Client → SHA-256(file) → API: "hash exists?"
    YES → Add reference in DB → Done (instant, no upload)
    NO  → API: initiate multipart upload
        → S3 returns pre-signed URLs per chunk
        → Client uploads chunks directly to S3 (parallel)
        → Client: "upload complete"
        → API: complete multipart upload in S3
        → API: INSERT metadata in PostgreSQL

DOWNLOAD:
  Client → API: "give me file-123"
  API → Generate pre-signed CDN URL (expires in 1 hour)
  Client → CDN edge → S3 origin (on cache miss)
  → Your servers handle 0 bytes of file data

DEDUPLICATION:
  10B files, 50 PB raw → with dedup: ~35 PB actual storage
  → 30% storage savings = millions of dollars saved

STORAGE TIERS:
  Hot (accessed in last 30 days):  S3 Standard
  Warm (30-90 days):               S3 Infrequent Access
  Cold (>90 days):                 S3 Glacier
  → Lifecycle policies auto-transition files between tiers

Trade-offs & Decision Making

Dimension	Database (PostgreSQL)	Object Storage (S3)
Best for	Structured data (rows, columns, queries)	Large binary files (images, videos, docs)
Max object size	~1 GB (practical limit)	5 TB per object
Scalability	Vertical (bigger machine) + sharding	Infinite (managed service)
Cost per GB	~$0.10-0.50/GB/month (SSD)	~$0.023/GB/month (S3 Standard)
Query support	Full SQL (JOINs, WHERE, GROUP BY)	GET by key only (no queries)
Durability	Depends on replication setup	99.999999999% (11 nines)
Backup impact	Large blobs bloat backups	Independent backup/versioning

Upload Strategy Comparison

Strategy	Reliability	Speed	Complexity	Best For
Direct upload (single request)	Low (fails = restart)	Fast for small files	Very low	Files < 10 MB
Chunked upload	High (retry per chunk)	Fast (parallel chunks)	Medium	Files > 10 MB
Pre-signed URL + chunked	High	Fastest (direct to S3)	Medium	Any size (production standard)

Storage Key Strategy

Strategy	Deduplication	Mutability	Complexity	Best For
Random ID (UUID)	None	Key is stable (content can change)	Low	Mutable files, user uploads
Content hash (SHA-256)	Automatic	Immutable (new content = new key)	Medium	Dedup-heavy systems, backups, Git
Path-based (user/folder/file)	None	Key is human-readable	Low	File systems, simple apps

🎯 Decision Framework

Files < 1 MB → store in DB if convenient (avatars, thumbnails). Files > 1 MB → always object storage. Uploads > 10 MB → chunked/multipart. Duplicate-heavy content → content-addressable. Global delivery → CDN in front of object storage. This covers 99% of file storage use cases.

Interview Questions

Q:Why not store files in a database?

A: Databases are optimized for small, structured, queryable data — not large binary blobs. Storing a 10 MB image in PostgreSQL: bloats the table (slower queries on all rows), bloats backups (backup a 500 GB DB vs a 50 GB DB + S3), wastes expensive storage ($0.30/GB SSD vs $0.023/GB S3), and every read transfers 10 MB through the DB connection pool. Object storage is 10x cheaper, infinitely scalable, and designed for this exact use case. Store metadata in the DB, files in S3.

Q:How do resumable uploads work?

A: The file is split into chunks (e.g., 100 MB each). Each chunk is uploaded independently with its chunk number. The server tracks which chunks have been received. If the upload is interrupted (network failure, app crash), the client asks the server 'which chunks do you have?' and resumes from the first missing chunk. S3 Multipart Upload provides this natively: initiate upload → upload parts (with pre-signed URLs) → complete upload. Only failed parts are retried, not the entire file.

Q:What is content-addressable storage?

A: A storage system where the key is derived from the content itself — typically SHA-256(file_bytes). Same content always produces the same key. Benefits: (1) Automatic deduplication — if the hash exists, the file is already stored. (2) Integrity verification — re-hash on download and compare. (3) Immutability — the key is tied to the content, so it can't be silently modified. Used by Git (every object is hash-addressed), backup systems (incremental dedup), and CDNs (content hash as cache key for automatic cache busting).

You're designing Instagram's photo upload system

How would you handle 50M photo uploads per day?

Answer: Pre-signed URLs for direct-to-S3 upload (your servers never touch photo bytes). Chunked upload for large photos/videos. On upload complete: generate thumbnails (Lambda/worker), store metadata in PostgreSQL (user_id, s3_key, dimensions, created_at), push to CDN for delivery. Content-hash for deduplication (same meme uploaded by 1000 users = stored once). Storage tiers: S3 Standard for recent photos, S3 IA for photos older than 90 days. CDN serves all photo requests — origin handles only cache misses.

Users complain that large file uploads fail frequently

How do you make uploads reliable?

Answer: Switch from single-request upload to chunked/resumable upload. Split files into 50-100 MB chunks. Upload each chunk independently (with retry on failure). Track upload progress server-side. If the client disconnects, they can resume from the last successful chunk. Use S3 Multipart Upload with pre-signed URLs per part — the client uploads directly to S3, your servers only handle the initiate/complete API calls. Add upload progress UI so users see percentage and can pause/resume.

Pitfalls

🗄️

Storing large blobs in the database

Inserting images, videos, or PDFs as BYTEA/BLOB columns in PostgreSQL. A table with 1M rows × 5 MB average = 5 TB database. Backups take hours. Queries on other columns are slow because the table is massive. Connection pool is saturated transferring file bytes.

✅Store files in S3 (or equivalent object storage). Store only the S3 key/URL in the database. The DB stays small and fast. S3 handles the storage, durability, and bandwidth. This is non-negotiable for any system with user-uploaded files.

📡

Not handling upload failures

Using a single HTTP POST for a 500 MB video upload. The connection drops at 400 MB — the user must restart from zero. On mobile networks, large uploads fail frequently. Users give up after 2-3 failed attempts.

✅Use chunked/resumable uploads for any file over 10 MB. Track chunk progress server-side. On failure, resume from the last successful chunk. Show upload progress to the user. Use S3 Multipart Upload for the backend implementation — it handles chunk storage and reassembly.

📦

Ignoring deduplication opportunities

A messaging app where 10,000 users share the same viral video. Each upload stores a separate copy: 10,000 × 50 MB = 500 GB for one video. With deduplication: 50 MB stored once, 10,000 metadata references.

✅Use content-addressable storage for systems with significant duplicate content. Hash the file before upload. If the hash exists, skip the upload and add a reference. For a messaging app or file-sharing platform, this can reduce storage by 20-40%.

📋

Poor metadata management

Storing files in S3 without tracking metadata in a database. 'Which files belong to user 42?' requires listing the entire S3 bucket and filtering by prefix — slow and expensive. 'How much storage is user 42 using?' requires summing all their objects — O(n) operation.

✅Always maintain a metadata table in your database: file_id, s3_key, owner_id, size, type, created_at. All queries about files (list, search, quota) hit the fast, indexed database. S3 is only accessed for actual file bytes (upload/download).

Storage Patterns

Table of Contents

The Big Picture — Why Blobs Are Different

The Warehouse vs Spreadsheet Analogy

Architecture Overview

Object Storage

Metadata DB

CDN

Object Storage (S3-style)

Pre-Signed URLs — Direct Client Upload

Strengths

Characteristics

Chunked & Resumable Uploads

Benefits

Complexity

Content-Addressable Storage

Git

Backup Systems

CDN & Caching

Strengths

Trade-offs

End-to-End Scenario

📁 Cloud File Storage — 100M Users, 10B Files

Client computes content hash (deduplication check)

Chunked upload via pre-signed URLs

Metadata stored in PostgreSQL

Downloads via CDN

Trade-offs & Decision Making

Upload Strategy Comparison

Storage Key Strategy

Interview Questions

Q:Why not store files in a database?

Q:How do resumable uploads work?

Q:What is content-addressable storage?

You're designing Instagram's photo upload system

Users complain that large file uploads fail frequently

Pitfalls

Storing large blobs in the database

Not handling upload failures

Ignoring deduplication opportunities

Poor metadata management