Locking Strategies
Master concurrency control — optimistic vs pessimistic locking, distributed locks (Redlock), and row-level vs table-level locks. Trade-offs between performance and correctness.
Table of Contents
The Big Picture — Why Contention Happens
When two users try to book the last concert ticket at the same millisecond, or two servers try to process the same payment simultaneously, you have contention — multiple actors competing for the same resource. Without coordination, both succeed, and you've sold one ticket twice or charged a customer double.
The Shared Document Analogy
Two people editing the same Google Doc paragraph at the same time. Without coordination, Alice types 'Hello' and Bob types 'World' in the same spot — one overwrites the other. That's a lost update. Pessimistic locking: Alice locks the paragraph. Bob sees 'locked by Alice' and waits. No conflict, but Bob is blocked. Optimistic locking: both edit freely. When Bob tries to save, the system detects Alice already changed it — Bob must merge or retry. No blocking, but conflicts need handling.
🔥 Key Insight
Locking is about choosing between blocking (wait for access) and retrying (try and handle conflicts). Blocking is safe but slow. Retrying is fast but complex. The right choice depends on how often conflicts actually happen.
Locking Overview
Two users buy the last ticket simultaneously: Without locking: User A: SELECT stock FROM tickets WHERE id=1; → stock = 1 User B: SELECT stock FROM tickets WHERE id=1; → stock = 1 User A: UPDATE tickets SET stock = 0 WHERE id=1; ✅ User B: UPDATE tickets SET stock = 0 WHERE id=1; ✅ (but stock was already 0!) → Two tickets sold, only one existed. Lost update. With locking: User A: SELECT stock FROM tickets WHERE id=1 FOR UPDATE; → stock = 1 (row locked) User B: SELECT stock FROM tickets WHERE id=1 FOR UPDATE; → BLOCKED (waiting for A) User A: UPDATE tickets SET stock = 0; COMMIT; → lock released User B: SELECT stock → stock = 0 → "Sold out!" → no sale → Correct behavior. One ticket, one sale.
Lost Updates
Two transactions read the same value, both modify it, last write wins — the first update is silently lost. Classic race condition.
Race Conditions
The outcome depends on the timing of operations. Same code, same inputs, different results depending on which thread runs first.
Double Processing
Two workers pick up the same job, both process it. Customer charged twice, email sent twice, inventory decremented twice.
Optimistic vs Pessimistic Locking
Optimistic Locking — Assume No Conflict
Optimistic locking doesn't lock anything upfront. It lets everyone read and modify freely, then checks for conflicts at write time using a version number or timestamp. If someone else changed the data since you read it, your write is rejected and you retry.
-- Table has a version column -- products: id=42, name="Laptop", price=999, version=3 -- User A reads: SELECT id, price, version FROM products WHERE id = 42; -- → price=999, version=3 -- User B reads (same time): SELECT id, price, version FROM products WHERE id = 42; -- → price=999, version=3 -- User A updates (includes version check): UPDATE products SET price = 899, version = 4 WHERE id = 42 AND version = 3; -- → 1 row affected ✅ (version matched, update succeeded) -- User B tries to update (same version check): UPDATE products SET price = 799, version = 4 WHERE id = 42 AND version = 3; -- → 0 rows affected ❌ (version is now 4, not 3!) -- → User B detects conflict, retries with fresh data
Strengths
- ✅No blocking — readers and writers don't wait for each other
- ✅High concurrency — many transactions proceed in parallel
- ✅No deadlocks (no locks held)
- ✅Simple to implement (version column + WHERE clause)
- ✅Best for: low-contention systems (conflicts are rare)
Costs
- ❌Retry overhead when conflicts occur
- ❌Under high contention: many retries → worse than pessimistic
- ❌Application must handle retry logic
- ❌Starvation possible (one transaction keeps losing to others)
- ❌Not suitable for: high-contention hot rows
Pessimistic Locking — Assume Conflict
Pessimistic locking acquires a lock before reading or writing. Other transactions that need the same resource must wait until the lock is released. No conflicts possible — but concurrency is reduced.
-- User A locks the row: BEGIN; SELECT * FROM products WHERE id = 42 FOR UPDATE; -- → Row is now LOCKED. No one else can modify it. -- User B tries to read the same row: SELECT * FROM products WHERE id = 42 FOR UPDATE; -- → BLOCKED. Waiting for User A to release the lock. -- User A updates and commits: UPDATE products SET price = 899 WHERE id = 42; COMMIT; -- → Lock released. -- User B's query now proceeds: -- → Reads the updated price (899) -- → Can now modify and commit -- No conflict possible. But User B waited.
Strengths
- ✅Prevents all conflicts (lock before access)
- ✅No retry logic needed (conflicts can't happen)
- ✅Predictable behavior (wait, then proceed)
- ✅Best for: high-contention resources (last ticket, account balance)
Costs
- ❌Blocking — transactions wait for locks (reduced concurrency)
- ❌Deadlocks possible (A locks X, waits for Y; B locks Y, waits for X)
- ❌Lock duration matters — long transactions hold locks longer
- ❌Throughput drops under high concurrency
- ❌Not suitable for: low-contention, high-throughput systems
| Dimension | Optimistic | Pessimistic |
|---|---|---|
| Assumption | Conflicts are rare | Conflicts are likely |
| Mechanism | Version check at write time | Lock before read/write |
| Blocking | No (non-blocking) | Yes (waits for lock) |
| Conflict handling | Detect and retry | Prevent entirely |
| Deadlocks | Impossible | Possible |
| Throughput | High (when conflicts rare) | Lower (blocking) |
| Best for | Product catalog, user profiles | Ticket booking, payments, inventory |
🎯 Interview Insight
The choice depends on conflict frequency. Low contention (most web apps): optimistic locking — fast, no blocking, retry on the rare conflict. High contention (last seat, flash sale): pessimistic locking — block to prevent conflicts entirely. Say: "I'd use optimistic locking by default and switch to pessimistic for hot resources where conflicts are frequent."
Distributed Locks (Redlock)
Database locks work within a single database. But what if you need to coordinate across multiple services, each with its own database? You need a distributed lock — a lock that works across machines.
The Hotel Room Key
A hotel room key ensures only one guest can access the room at a time. The front desk (lock service) issues the key. If the guest doesn't return the key within checkout time (TTL), the desk issues a new one. If the desk has multiple locations (distributed), they must agree on who has the key — that's the hard part.
Acquire lock: SET lock:payment:order-123 "worker-A" NX EX 30 → NX: only set if key doesn't exist (atomic) → EX 30: expires in 30 seconds (TTL safety net) → Returns OK if acquired, nil if already locked Do work: Process payment for order-123... Release lock: Check if we still own it (value = "worker-A") DEL lock:payment:order-123 Why TTL? If worker-A crashes without releasing the lock, the lock expires after 30 seconds → other workers can proceed. Without TTL: lock held forever → deadlock.
Redlock — Multi-Node Distributed Lock
Problem: single Redis node = single point of failure. If Redis crashes, the lock is lost. Two workers proceed. Redlock solution: acquire lock on MAJORITY of N Redis nodes. 5 independent Redis nodes: [R1, R2, R3, R4, R5] Acquire: 1. Try SET lock NX EX 30 on ALL 5 nodes 2. If acquired on ≥ 3 nodes (majority) within time limit → LOCKED ✅ 3. If acquired on < 3 nodes → FAILED, release all acquired locks Worker A acquires on R1, R2, R3 → majority (3/5) → lock granted ✅ Worker B tries → R1, R2, R3 already locked → fails → waits Release: DEL lock on ALL 5 nodes (even ones where acquire failed) Failure scenario: R3 crashes → Worker A still has lock on R1, R2 (2/5, not majority) → Lock is effectively lost. This is a known Redlock limitation. → For critical systems, use a consensus-based lock (Zookeeper, etcd).
When to use distributed locks
- ✅Preventing duplicate job processing across workers
- ✅Coordinating access to external APIs (rate-limited)
- ✅Ensuring only one instance runs a scheduled task
- ✅Cross-service resource coordination
Why distributed locks are hard
- ❌Network partitions can cause split-brain (two holders)
- ❌Clock skew can cause premature TTL expiration
- ❌GC pauses can cause a holder to lose the lock unknowingly
- ❌Redlock has known theoretical issues (Kleppmann critique)
- ❌For strong guarantees: use Zookeeper or etcd (consensus-based)
🎯 Interview Insight
Distributed locks are a last resort, not a first choice. Say: "I'd first try to avoid distributed locks by using idempotent operations or partitioning work so no coordination is needed. If a lock is unavoidable, I'd use Redis with TTL for non-critical coordination, or Zookeeper/etcd for critical operations that need strong guarantees."
Row-Level vs Table-Level Locks
🔹 Row-Level Locks
- Lock only the specific row being modified
- Other rows in the same table remain accessible
- High concurrency — many transactions on different rows
- More overhead per lock (track each row individually)
- Default in PostgreSQL and MySQL/InnoDB
🔹 Table-Level Locks
- Lock the entire table
- No other transaction can read or write any row
- Low concurrency — one transaction at a time
- Less overhead (one lock for the whole table)
- Used for: schema changes, bulk operations, MyISAM
Table: orders (10M rows) Row-level lock (PostgreSQL default): Tx A: UPDATE orders SET status='shipped' WHERE id = 42; → locks row 42 only Tx B: UPDATE orders SET status='shipped' WHERE id = 99; → proceeds ✅ (different row) Tx C: SELECT * FROM orders WHERE id = 500; → proceeds ✅ (read, no conflict) → High concurrency: 3 transactions run in parallel Table-level lock: Tx A: LOCK TABLE orders; UPDATE ... WHERE id = 42; → locks ENTIRE table Tx B: UPDATE orders WHERE id = 99; → BLOCKED ❌ (table locked) Tx C: SELECT * FROM orders WHERE id = 500; → BLOCKED ❌ (table locked) → Low concurrency: only Tx A runs, B and C wait For 10K concurrent transactions: Row-level: most proceed in parallel (different rows) Table-level: all wait in a queue (one at a time)
| Dimension | Row-Level Lock | Table-Level Lock |
|---|---|---|
| Granularity | Single row | Entire table |
| Concurrency | High (parallel on different rows) | Low (one transaction at a time) |
| Overhead | Higher (per-row tracking) | Lower (one lock) |
| Deadlock risk | Higher (complex lock ordering) | Lower (coarse granularity) |
| Use case | OLTP, web apps, concurrent updates | Schema migrations, bulk imports, maintenance |
| Default in | PostgreSQL, MySQL/InnoDB | MySQL/MyISAM, explicit LOCK TABLE |
🎯 Interview Insight
Modern databases use row-level locking by default. Table-level locks are for maintenance operations (schema changes, bulk imports), not for application logic. If someone mentions table-level locking for a web app, it's a red flag — it would serialize all transactions and destroy throughput.
End-to-End Scenario
Let's design the locking strategy for a ticket booking system — the classic high-contention scenario.
🎫 Concert Ticket Booking — 50K Users, 1,000 Seats
50,000 users trying to book 1,000 seats when sales open.
Requirements: no double-booking, no overselling, fast response.
Seat selection → Pessimistic locking (SELECT FOR UPDATE)
When a user selects seat A7, the system locks that specific row: SELECT * FROM seats WHERE id = 'A7' AND status = 'available' FOR UPDATE. If another user tries to select A7 simultaneously, they wait. The first user completes the booking, the second sees 'seat taken.' Row-level lock — other seats remain bookable in parallel.
Payment processing → Distributed lock
Payment involves an external API (Stripe). Use a Redis distributed lock keyed on order_id to prevent duplicate payment calls. Lock: SET lock:payment:order-456 NX EX 60. Process payment. Release lock. If the worker crashes, the TTL releases the lock after 60 seconds.
Inventory count → Optimistic locking
The 'remaining seats' counter uses optimistic locking. Read: remaining = 847, version = 153. After booking: UPDATE events SET remaining = 846, version = 154 WHERE id = 1 AND version = 153. If another booking changed the version, retry with fresh data. Low contention on the counter (one row, but updates are fast).
Seat map display → No locking
The seat map showing available/taken seats is read-only. No locking needed. Serve from a cache (Redis) updated on every booking. Slight staleness is acceptable — a seat might show as available for 1 second after being booked. The actual booking step (Step 1) prevents double-booking.
User selects seat A7: 1. Seat map (read) → No lock (cached, read-only) 2. Reserve seat (write) → Pessimistic lock (SELECT FOR UPDATE on seat row) 3. Process payment → Distributed lock (Redis, keyed on order_id) 4. Update counter → Optimistic lock (version column) 5. Confirm booking → Release all locks Concurrency: 50K users selecting different seats → parallel (row-level locks) 2 users selecting same seat → one waits (pessimistic lock) Payment processing → one at a time per order (distributed lock) Counter update → retry on conflict (optimistic lock)
Trade-offs & Decision Making
| Strategy | Blocking | Conflict Handling | Deadlock Risk | Throughput | Best For |
|---|---|---|---|---|---|
| Optimistic | No | Detect + retry | None | High (low contention) | Catalogs, profiles, settings |
| Pessimistic | Yes | Prevent entirely | Possible | Lower | Bookings, payments, inventory |
| Distributed (Redis) | Yes (with TTL) | Prevent across services | TTL prevents | Medium | Cross-service coordination |
| Distributed (Zookeeper) | Yes | Prevent with consensus | Handled | Lower | Critical coordination, leader election |
Local vs Distributed Locks
| Dimension | Local (DB Lock) | Distributed (Redis/ZK) |
|---|---|---|
| Scope | Single database | Across services and databases |
| Reliability | High (DB guarantees) | Lower (network failures, clock skew) |
| Complexity | Low (built into DB) | High (TTL, retry, split-brain) |
| Performance | Fast (local) | Slower (network round trip) |
| Use when | Contention within one DB | Contention across services |
🎯 Decision Framework
(1) Can you avoid locking entirely? Use idempotent operations or partition work. (2) Contention within one database? Use DB locks (optimistic for low contention, pessimistic for high). (3) Contention across services? Use distributed locks (Redis for best-effort, Zookeeper for strong guarantees). Always prefer the simplest option that meets your correctness requirements.
Interview Questions
Q:Optimistic vs pessimistic locking — when to use each?
A: Optimistic when conflicts are rare (most web apps): read freely, check version at write time, retry on conflict. No blocking, high throughput. Pessimistic when conflicts are frequent or consequences are severe (ticket booking, payments): lock before reading, others wait. No conflicts possible, but lower throughput. Rule of thumb: if <5% of transactions conflict → optimistic. If >20% conflict → pessimistic. In between → measure and decide.
Q:What is a distributed lock and why is it hard?
A: A distributed lock coordinates access across multiple machines. It's hard because: (1) the lock service itself can fail — if Redis crashes, the lock is lost. (2) Network partitions can cause two processes to both think they hold the lock. (3) Clock skew can cause TTL to expire prematurely. (4) GC pauses can cause a process to hold the lock past its TTL without knowing. Redlock (multi-node Redis) mitigates some issues but has theoretical limitations. For strong guarantees, use consensus-based systems (Zookeeper, etcd).
Q:Why are distributed locks hard?
A: The fundamental issue: in a distributed system, you can't distinguish between 'the lock holder is slow' and 'the lock holder has crashed.' If you wait too long, you block progress. If you release too early (TTL), the original holder might still be working — now two processes hold the 'lock.' This is the fencing token problem. Solution: use fencing tokens — each lock acquisition gets a monotonically increasing token. The resource rejects operations with old tokens. This requires the resource to participate in the locking protocol.
Two users try to book the last seat on a flight simultaneously
How do you prevent double-booking?
Answer: Pessimistic locking: SELECT * FROM seats WHERE flight_id = 123 AND seat = '14A' AND status = 'available' FOR UPDATE. The first transaction locks the row. The second waits. First completes the booking, second sees 'seat taken.' Alternative: optimistic locking with a version column — but for a hot resource (last seat, flash sale), pessimistic is better because the retry rate would be very high with optimistic.
A background job runs on 10 servers and must process each task exactly once
How do you prevent duplicate processing?
Answer: Option 1: Distributed lock per task. Before processing task-123, acquire lock: SET lock:task:123 NX EX 300. Only one server gets the lock. Others skip. TTL ensures the lock is released if the server crashes. Option 2 (better): Use a message queue (SQS, Kafka) with visibility timeout. Each message is delivered to one consumer. If the consumer doesn't ACK within the timeout, the message is redelivered to another consumer. No explicit locking needed — the queue handles coordination.
Pitfalls
Overusing locks
Locking every database operation 'for safety.' Locks reduce concurrency. A system that locks every read and write serializes all transactions — throughput drops to a fraction of capacity. Most reads don't need locks. Most writes on different rows don't conflict.
✅Lock only when contention is real. Use optimistic locking as the default (no blocking). Use pessimistic locking only for hot resources where conflicts are frequent. Never lock reads unless you need repeatable-read isolation for that specific transaction.
Not handling retries in optimistic locking
Using optimistic locking (version column) but not implementing retry logic. When the version check fails, the application returns an error to the user: 'Something went wrong, try again.' The user is confused and frustrated.
✅Implement automatic retries with a limit. On version conflict: re-read the data, re-apply the change, retry the update. Limit to 3-5 retries. If still failing (extreme contention), fall back to pessimistic locking or return a meaningful error: 'This item was just updated. Please review and try again.'
Deadlocks in pessimistic locking
Transaction A locks row 1, then tries to lock row 2. Transaction B locks row 2, then tries to lock row 1. Both wait forever. The database detects the deadlock and kills one transaction — but the application must handle the error and retry.
✅Always acquire locks in a consistent order (e.g., by ID ascending). If you need rows 1 and 2, always lock 1 first, then 2. Set lock timeouts (SET lock_timeout = '5s') so transactions don't wait forever. Handle deadlock errors in application code (retry the entire transaction).
Incorrect distributed lock implementation
Using a single Redis SET NX for a distributed lock without TTL. If the lock holder crashes, the lock is held forever — all other workers are permanently blocked. Or: releasing someone else's lock (not checking the lock value before DEL).
✅Always use TTL: SET lock:key value NX EX 30. Always check ownership before releasing: only DEL if the value matches your worker ID (use a Lua script for atomicity). For critical operations, use Redlock (multi-node) or Zookeeper. Monitor lock acquisition failures and TTL expirations.