Write Offloading
Handle high write throughput with WAL, event sourcing, async writes via message queues, and buffered batch writes. Decouple write ingestion from persistence.
Table of Contents
The Big Picture ā Why Writes Bottleneck
A database write is expensive. It must validate constraints, update indexes, write to the WAL, flush to disk, and replicate to followers ā all before acknowledging the client. A single PostgreSQL instance tops out at ~10,000-20,000 writes per second. When your system needs 500,000 writes per second, direct synchronous writes to the database simply don't work.
The Restaurant Order Queue
A busy restaurant doesn't have the chef cook each dish the instant it's ordered. That would overwhelm the kitchen during rush hour. Instead: the waiter writes the order on a ticket (WAL), puts it on the order rail (queue), and tells the customer 'your order is confirmed.' The kitchen processes tickets at its own pace, batching similar orders together (batch writes). If the kitchen falls behind, orders queue up but nobody is turned away. The customer doesn't wait for the food to be cooked before getting confirmation ā they wait for the ticket to be written. That's write offloading.
š„ Key Insight
Write offloading decouples write ingestion (accepting the data) from write persistence (storing it durably). The client gets a fast acknowledgment. The actual database write happens asynchronously, in batches, or through a pipeline. This turns a synchronous bottleneck into an asynchronous, scalable flow.
Write Offloading Overview
SYNCHRONOUS (direct DB write): Client ā App Server ā Database ā Acknowledge Latency: 20-100ms (wait for DB) Throughput: limited by DB write speed During spike: DB overwhelmed ā timeouts ā failures OFFLOADED (buffered, async): Client ā App Server ā Buffer/Queue ā Acknowledge (fast!) ā Worker ā Database (at its own pace) Latency: 1-5ms (write to queue only) Throughput: limited by queue capacity (much higher) During spike: queue absorbs burst ā DB processes steadily
Smooth Traffic Spikes
A flash sale generates 100K writes/sec for 5 minutes. The DB handles 10K/sec. Without a queue: 90K writes fail. With a queue: all 100K are accepted, processed over ~50 minutes.
Improve Throughput
Batching 1,000 individual INSERTs into one bulk INSERT is 10-50x faster. The queue collects writes; the worker batches them. Same data, fraction of the DB load.
Write-Ahead Log (WAL)
The Write-Ahead Log is the foundation of database durability. Before any change is applied to the actual data files, it's first written to a sequential log on disk. If the system crashes mid-operation, the WAL can be replayed to recover the exact state.
Write Request: UPDATE users SET name = 'Alicia' WHERE id = 42 Step 1 ā Append to WAL (sequential disk write): WAL entry: {txn: 1001, table: users, id: 42, name: "Alicia"} ā Sequential write to disk: ~0.1ms (fast!) ā Write is now DURABLE (survives crash) Step 2 ā Apply to data files (random disk write): Update the actual row in the users table on disk Update indexes ā Random I/O: ~5-20ms (slow, but can be deferred) Step 3 ā Acknowledge to client: "Write confirmed" (after WAL write, not after data file update) Crash Recovery: System crashes after Step 1 but before Step 2 On restart: replay WAL entries ā apply missing changes ā No data loss, database is consistent
Why WAL is fast
- ā Sequential writes (append-only) ā 10-100x faster than random I/O
- ā Client acknowledged after WAL write, not after full persistence
- ā Actual data file updates can be batched and deferred
- ā Used by every major database: PostgreSQL, MySQL, SQLite, MongoDB
Trade-offs
- āExtra disk write (WAL + data files = write amplification)
- āWAL files accumulate and must be managed (rotation, archival)
- āRecovery time depends on WAL size (more entries = longer replay)
- āWAL is the bottleneck for write-heavy workloads on a single disk
šÆ Interview Insight
WAL is not a scaling pattern you implement ā it's built into every database. But understanding it explains why databases can acknowledge writes quickly (sequential WAL write) while actual persistence is slower (random I/O). It also explains replication: replicas receive the WAL stream and replay it to stay in sync.
Event Sourcing
Traditional databases store current state: "Alice's balance is $500." Event sourcing stores every event that led to that state: "Alice deposited $1000, withdrew $300, deposited $100, withdrew $300." The current state is derived by replaying all events.
TRADITIONAL (store current state): users table: { id: 42, name: "Alicia", balance: 500 } ā You know the current balance ā You DON'T know how it got there EVENT SOURCING (store events): events table: { id: 1, user: 42, type: "AccountCreated", data: {name: "Alice"} } { id: 2, user: 42, type: "Deposited", data: {amount: 1000} } { id: 3, user: 42, type: "Withdrawn", data: {amount: 300} } { id: 4, user: 42, type: "NameChanged", data: {name: "Alicia"} } { id: 5, user: 42, type: "Deposited", data: {amount: 100} } { id: 6, user: 42, type: "Withdrawn", data: {amount: 300} } Current state = replay all events: Created ā balance: 0 +1000 ā balance: 1000 -300 ā balance: 700 +100 ā balance: 800 -300 ā balance: 500 Name: "Alicia" ā You know the current balance AND the full history
Strengths
- ā Complete audit trail ā every change is recorded
- ā Time travel ā reconstruct state at any point in time
- ā Debugging ā replay events to reproduce bugs
- ā Event-driven architecture ā events feed other systems
- ā Append-only writes ā fast (no updates, no deletes)
Costs
- āRebuilding state is expensive (replay all events)
- āEvent schema evolution is complex (old events, new format)
- āStorage grows forever (events are never deleted)
- āQuerying current state requires snapshots or projections
- āSignificant complexity vs traditional CRUD
Problem: user has 10,000 events. Rebuilding state = replay 10,000 events. Solution: periodic snapshots Every 1,000 events, save a snapshot of current state: Snapshot at event 9000: { balance: 4,200, name: "Alicia" } To get current state: Load snapshot (event 9000) ā replay events 9001-10000 only ā 1,000 replays instead of 10,000
šÆ Interview Insight
Event sourcing is powerful but complex. Use it when: (1) you need a complete audit trail (financial systems, compliance), (2) you need to reconstruct past states (debugging, analytics), (3) you're building an event-driven architecture. Don't use it for simple CRUD ā the complexity isn't justified.
Async Writes via Message Queues
The most common write scaling pattern: instead of writing directly to the database, the application publishes a message to a queue. A worker consumes messages and writes to the database at a sustainable pace. The client gets an immediate acknowledgment without waiting for the DB.
SYNCHRONOUS (without queue): Client ā App ā INSERT INTO orders (...) ā wait 50ms ā "Order confirmed" Problem: DB handles 10K writes/sec. Flash sale: 100K writes/sec ā DB dies. ASYNC (with queue): Client ā App ā Publish to Kafka/SQS ā "Order received" (2ms) ā Worker ā INSERT INTO orders (...) ā at DB's pace Client latency: 2ms (queue write only) DB load: steady 10K/sec (worker processes at sustainable rate) Flash sale: queue absorbs 100K/sec burst, worker drains over time Queue depth during spike: T=0: 0 messages T=1m: (100K - 10K) Ć 60 = 5.4M messages queued T=5m: spike ends, queue starts draining T=14m: queue fully drained, all orders processed
Order Processing
User places order ā message to queue ā worker validates, charges payment, creates order in DB. User sees 'Order received' instantly. Confirmation email sent after processing.
Notifications
User action triggers notification ā message to queue ā worker sends push notification, email, SMS. The action completes instantly; notifications are delivered async.
Analytics Events
Page view, click, purchase ā message to Kafka ā consumer writes to analytics DB in batches. The user experience is never slowed by analytics tracking.
Strengths
- ā Decouples producer from consumer (independent scaling)
- ā Absorbs traffic spikes (queue buffers the burst)
- ā Retry built-in (failed messages re-queued automatically)
- ā Multiple consumers can process in parallel
- ā Most common write scaling pattern in production
Trade-offs
- āEventual consistency (data not in DB immediately)
- āMessage ordering challenges (parallel consumers)
- āDead letter queue needed for poison messages
- āExtra infrastructure (Kafka/SQS/RabbitMQ)
- āDebugging is harder (async flow, distributed tracing needed)
šÆ Interview Insight
When asked "how do you handle write spikes?" ā message queues are the default answer. Say: "I'd put a Kafka topic between the API and the database. The API publishes messages at any rate. Workers consume at the DB's sustainable rate. During spikes, the queue absorbs the burst. We trade immediate consistency for throughput and resilience."
Buffered Batch Writes
Instead of writing each record individually, buffer multiple writes in memory and flush them as a single batch. One bulk INSERT of 1,000 rows is 10-50x faster than 1,000 individual INSERTs ā because you pay the network round-trip and transaction overhead only once.
INDIVIDUAL WRITES (1,000 separate INSERTs): INSERT INTO events (user_id, type, data) VALUES (1, 'click', '...'); INSERT INTO events (user_id, type, data) VALUES (2, 'view', '...'); ... (998 more) ā 1,000 network round trips ā 1,000 transaction commits ā ~5 seconds total BATCH WRITE (1 bulk INSERT): INSERT INTO events (user_id, type, data) VALUES (1, 'click', '...'), (2, 'view', '...'), ... (998 more); ā 1 network round trip ā 1 transaction commit ā ~100ms total (50x faster) Buffering flow: Incoming writes ā In-memory buffer (array) When buffer reaches 1,000 items OR 100ms has passed: ā Flush: bulk INSERT all buffered items ā Clear buffer, start collecting again
ā±ļø Time-Based Flushing
- Flush every N milliseconds (e.g., every 100ms)
- Guarantees maximum latency before data is persisted
- Batch size varies (small during low traffic)
- Good for: systems with latency requirements
š¦ Size-Based Flushing
- Flush when buffer reaches N items (e.g., 1,000)
- Optimal batch size for DB performance
- Latency varies (slow during low traffic)
- Good for: high-throughput systems where latency is flexible
Strengths
- ā 10-50x throughput improvement over individual writes
- ā Reduces DB connection usage (fewer transactions)
- ā Simple to implement (buffer + flush timer)
- ā Works with any database (SQL, NoSQL, object storage)
Risks
- āData loss if process crashes before flushing buffer
- āIncreased latency (data sits in buffer before persisting)
- āBuffer memory usage during high traffic
- āMust handle flush failures (retry the entire batch)
š” Combining Queue + Batch
The most powerful pattern: Kafka queue + batch consumer. The queue absorbs spikes. The consumer reads messages in batches (e.g., 1,000 at a time) and does a single bulk INSERT. You get spike absorption AND batch efficiency. This is how most analytics and logging pipelines work.
End-to-End Scenario
Let's design the write pipeline for a logging platform ingesting 500,000 events per second from 10,000 servers.
š Logging Platform ā 500K Events/sec
Each event: ~500 bytes (timestamp, service, level, message).
Storage: ClickHouse (column-store, optimized for analytics).
ClickHouse bulk insert capacity: ~50K rows/sec per node.
Ingestion ā Kafka (message queue)
10,000 servers publish log events to Kafka topics. Kafka handles 500K writes/sec easily (append-only, sequential I/O). Events are partitioned by service name for ordering. Client SDKs get acknowledgment in ~2ms. No event is lost ā Kafka persists to disk.
Buffering ā Consumer batches
10 consumer workers read from Kafka. Each consumer buffers events in memory. Flush trigger: 10,000 events OR 1 second (whichever comes first). Each flush = one bulk INSERT of 10,000 rows into ClickHouse.
Persistence ā ClickHouse (batch writes)
10 consumers Ć 10,000 rows/flush Ć 1 flush/sec = 100K rows/sec to ClickHouse. But we're ingesting 500K/sec ā so consumers batch more aggressively during spikes. ClickHouse handles bulk inserts efficiently (column-store, compressed).
Event sourcing ā Full history
Kafka retains all events for 7 days. If ClickHouse needs to be rebuilt (schema change, data corruption), replay events from Kafka. The event log IS the source of truth. ClickHouse is a derived view.
10,000 Servers ā ā 500K events/sec ā¼ āāāāāāāāāāāāāāāā ā Kafka ā ā absorbs full ingestion rate ā (3 brokers) ā ā partitioned by service name ā ā ā 7-day retention (event sourcing) āāāāāāāā¬āāāāāāāā ā ā Consumer group (10 workers) ā Each buffers 10K events ā bulk INSERT ā¼ āāāāāāāāāāāāāāāā ā ClickHouse ā ā receives batched bulk inserts ā (3 nodes) ā ā column-store, compressed ā ā ā handles 100K+ rows/sec per node āāāāāāāāāāāāāāāā Spike handling: Normal: 500K/sec ā Kafka ā consumers keep up Spike (2M/sec): Kafka queues 1.5M/sec excess After spike: consumers drain the backlog No data loss, no DB overload
Trade-offs & Decision Making
| Technique | Throughput Gain | Latency Impact | Durability | Complexity | Best For |
|---|---|---|---|---|---|
| WAL | Moderate (sequential I/O) | Low (fast append) | High (crash recovery) | Built-in | Every database (foundation) |
| Event Sourcing | High (append-only) | Low (fast append) | High (full history) | Very high | Audit trails, financial systems |
| Message Queue | Very high (decoupled) | Higher (async processing) | High (queue persists) | Medium | Spike absorption, decoupling |
| Batch Writes | 10-50x | Higher (buffer delay) | Risk (buffer loss) | Low | Logging, analytics, bulk ingestion |
Sync vs Async Writes
| Dimension | Synchronous | Asynchronous (Offloaded) |
|---|---|---|
| Client latency | 20-100ms (wait for DB) | 1-5ms (write to queue/buffer) |
| Consistency | Immediate (data in DB on response) | Eventual (data in DB later) |
| Throughput | Limited by DB write speed | Limited by queue capacity (much higher) |
| Spike handling | DB overwhelmed ā failures | Queue absorbs burst ā smooth processing |
| Complexity | Simple (direct write) | Higher (queue, workers, retry logic) |
| Use when | Strong consistency required | Throughput > consistency |
Event Sourcing vs Traditional DB
| Dimension | Traditional (Store State) | Event Sourcing (Store Events) |
|---|---|---|
| What's stored | Current state only | Every event that changed state |
| History | Lost (overwritten on update) | Complete (replay any point in time) |
| Write speed | Moderate (update in place) | Fast (append-only) |
| Read speed | Fast (read current state) | Slow without snapshots (replay events) |
| Storage | Compact (current state only) | Grows forever (all events) |
| Complexity | Low (standard CRUD) | High (event schema, snapshots, projections) |
šÆ Decision Framework
Need immediate consistency? ā Synchronous writes. Need to handle spikes? ā Message queue. Need audit trail? ā Event sourcing. Need throughput? ā Batch writes. Most systems combine: queue for spike absorption + batch consumer for throughput + WAL in the DB for durability.
Interview Questions
Q:Why use a Write-Ahead Log?
A: WAL ensures durability without waiting for slow random I/O. The database appends changes to a sequential log file (fast) before updating the actual data files (slow). If the system crashes after the WAL write but before the data file update, the WAL is replayed on recovery ā no data loss. WAL also enables replication: replicas receive the WAL stream and replay it to stay in sync. Every major database (PostgreSQL, MySQL, MongoDB) uses WAL internally.
Q:What is event sourcing?
A: Instead of storing current state (balance = $500), you store every event that led to that state (deposited $1000, withdrew $300, deposited $100, withdrew $300). Current state is derived by replaying events. Benefits: complete audit trail, time-travel debugging, natural fit for event-driven architectures. Costs: rebuilding state is expensive (use snapshots), event schema evolution is complex, storage grows forever. Use for: financial systems, compliance-heavy domains, event-driven architectures.
Q:Why use message queues for writes?
A: Message queues decouple write ingestion from persistence. The API publishes to the queue (2ms) and responds immediately. Workers consume from the queue and write to the DB at a sustainable pace. Benefits: (1) absorbs traffic spikes without DB overload, (2) automatic retry on failure, (3) independent scaling of producers and consumers, (4) multiple consumers can process in parallel. This is the most common write scaling pattern ā used by virtually every large-scale system.
Q:When would you batch writes?
A: When you have high-volume, individually small writes ā logging events, analytics clicks, IoT sensor data. One bulk INSERT of 1,000 rows is 10-50x faster than 1,000 individual INSERTs (one network round trip, one transaction commit). Combine with a message queue: Kafka absorbs the stream, a consumer reads batches of 1,000 messages, does one bulk INSERT. Trade-off: increased latency (data sits in buffer) and risk of data loss if the buffer crashes before flushing.
Pitfalls
Ignoring failure handling in async writes
Publishing to a queue and assuming the write will succeed. The worker might crash, the DB might reject the data, the message might be malformed. Without retry logic and dead letter queues, writes silently disappear.
ā Implement: (1) automatic retries with exponential backoff, (2) dead letter queue for messages that fail after N retries, (3) monitoring on queue depth and DLQ size, (4) idempotent consumers (processing the same message twice produces the same result).
Not handling retries correctly
Retrying a failed write without idempotency. The first attempt succeeded but the acknowledgment was lost. The retry creates a duplicate order, charges the customer twice, or inserts duplicate data.
ā Make all write operations idempotent. Use idempotency keys: the client sends a unique request ID. The server checks if that ID was already processed before executing. If yes ā return the previous result. If no ā process and store the ID.
Data loss in buffers
Buffering 10,000 events in application memory for batch writes. The process crashes ā 10,000 events lost. For analytics, this might be acceptable. For orders or payments, it's catastrophic.
ā Use a durable buffer (Kafka, not in-memory). Kafka persists messages to disk before acknowledging. If the consumer crashes, messages are re-delivered from Kafka. The buffer is the queue, not application memory. For critical data, never buffer in memory without a durable backing store.
Overcomplicating with event sourcing
Using event sourcing for a simple CRUD application. The team now manages event schemas, snapshot logic, projection rebuilds, and eventual consistency ā all for a system that handles 100 writes/sec and doesn't need an audit trail.
ā Event sourcing is for specific use cases: financial systems (audit trail required by law), event-driven architectures (events are the primary communication mechanism), or systems that need time-travel debugging. For standard web apps, a traditional database with soft deletes and an audit log table is simpler and sufficient.