Delivery Guarantees
Understand message delivery semantics — at-most-once, at-least-once, exactly-once, ordered delivery with offsets, and fan-out to subscribers.
Table of Contents
The Big Picture — Why Delivery Is Hard
In a single process, calling a function is reliable — it either succeeds or throws an error. In a distributed system, sending a message across a network is fundamentally unreliable. The network can drop the message, deliver it twice, or deliver it out of order. The acknowledgment can be lost even if the message was delivered. This makes "did the consumer get my message?" a surprisingly hard question.
The Courier Delivery Analogy
You send a package via courier. Three things can happen: (1) The package is lost in transit — you never know if it arrived (at-most-once). (2) You send it with tracking and retry if no confirmation — the package might arrive twice if the first confirmation was lost (at-least-once). (3) You send it with tracking, retry, AND the recipient checks if they already received it before accepting — perfect delivery, but expensive and complex (exactly-once). Every messaging system faces this same trilemma: fast and simple (risk loss), reliable (risk duplicates), or perfect (expensive and complex).
🔥 Key Insight
There is no free lunch. Stronger delivery guarantees cost more in latency, complexity, and throughput. The skill is matching the guarantee to the use case: metrics can tolerate loss, payments cannot tolerate duplicates, and most systems live happily with at-least-once + idempotency.
Where Guarantees Matter
Producer → Broker (Kafka, SQS, RabbitMQ) → Consumer(s) Delivery guarantee applies at each hop: Producer → Broker: Did the broker receive the message? Broker → Consumer: Did the consumer process the message? Failures can happen at any point: 1. Producer sends, network drops it → message lost 2. Broker receives, crashes before storing → message lost 3. Broker delivers, consumer crashes → message not processed 4. Consumer processes, ACK is lost → broker retries → duplicate 5. Network delivers message twice → duplicate processing
Message Queues
SQS, RabbitMQ — task distribution. Each message processed by one consumer. Delivery guarantee determines if tasks can be lost or duplicated.
Event Streaming
Kafka, Kinesis — event log. Multiple consumers read the same stream. Offsets track position. Guarantee determines replay behavior.
Pub/Sub
Redis Pub/Sub, SNS — broadcast. One message to many subscribers. Each subscriber needs its own delivery guarantee.
At-Most-Once Delivery
At-most-once means the message is delivered zero or one time. The producer sends the message and moves on — no retries, no acknowledgment tracking. If the message is lost, it's gone.
Producer → Send message → Done (no retry) Scenario 1 (success): Producer sends → Broker receives → Consumer processes ✅ Message delivered once. Perfect. Scenario 2 (failure): Producer sends → Network drops message → Lost forever ❌ No retry. Producer doesn't know it was lost. Scenario 3 (partial failure): Producer sends → Broker receives → Consumer crashes before processing Broker doesn't retry → Message effectively lost ❌ Internal mechanism: Fire-and-forget. No ACK tracking. No retry logic. UDP-style: send and hope for the best.
Strengths
- ✅Fastest — no ACK waiting, no retry overhead
- ✅Simplest — no deduplication, no offset tracking
- ✅Lowest latency — fire and forget
- ✅No duplicates — message sent at most once
When to use
- ✅Metrics and monitoring (losing 0.1% of data points is fine)
- ✅Non-critical logging (a few missing log lines are acceptable)
- ✅Real-time sensor data (next reading replaces the lost one)
- ✅Any scenario where loss is cheaper than retry complexity
🎯 Interview Insight
At-most-once is rarely the right choice for business-critical data. But it's perfect for high-volume, low-value data where the cost of loss is negligible and the cost of retries would be prohibitive. Mention it to show you understand the full spectrum of guarantees.
At-Least-Once Delivery
At-least-once means the message is delivered one or more times. The producer retries until it gets an acknowledgment. No message is ever lost — but duplicates are possible if the ACK is lost after successful delivery.
Producer → Send message → Wait for ACK → Retry if no ACK Scenario 1 (success): Producer sends → Broker ACKs → Done ✅ Message delivered once. Scenario 2 (retry success): Producer sends → Network drops message → No ACK Producer retries → Broker receives → ACKs → Done ✅ Message delivered once (retry worked). Scenario 3 (duplicate!): Producer sends → Broker receives → Processes → Sends ACK ACK is lost in network → Producer thinks it failed Producer retries → Broker receives AGAIN → Processes AGAIN ⚠️ Message delivered TWICE. Consumer must handle the duplicate. Internal mechanism: Retry with exponential backoff until ACK received. Consumer must ACK after processing (not before). If consumer crashes after processing but before ACK → redelivery.
Strengths
- ✅No data loss — every message is eventually delivered
- ✅Simple to implement (retry + ACK)
- ✅Most common guarantee in production systems
- ✅Works with any message broker (Kafka, SQS, RabbitMQ)
The duplicate problem
- ❌Consumer might process the same message twice
- ❌Payment charged twice, order created twice, email sent twice
- ❌Consumer MUST be idempotent (same input → same result)
- ❌Use deduplication IDs to detect and skip duplicates
Problem: order message delivered twice → two orders created Solution 1 — Idempotency key: Message: { idempotency_key: "order-abc-123", user: 42, total: 99.99 } Consumer: if (db.exists("processed_keys", "order-abc-123")) → skip else → process order, store "order-abc-123" in processed_keys Solution 2 — Database constraint: INSERT INTO orders (idempotency_key, user_id, total) VALUES ('order-abc-123', 42, 99.99) ON CONFLICT (idempotency_key) DO NOTHING; → Second insert silently ignored. No duplicate order. Solution 3 — Natural idempotency: "SET user balance to $500" is naturally idempotent. Running it twice produces the same result. "ADD $100 to balance" is NOT idempotent — running twice adds $200.
🎯 Interview Insight
At-least-once + idempotent consumers is the industry standard. When asked about delivery guarantees, say: "I'd use at-least-once delivery with idempotent consumers. Each message carries a unique ID. The consumer checks if it's already processed before executing. This gives us effective exactly-once semantics without the complexity of true exactly-once."
Exactly-Once Delivery
Exactly-once means the message is delivered and processed exactly one time — no loss, no duplicates. It's the holy grail of messaging, and it's extremely hard to achieve in distributed systems. In practice, most "exactly-once" systems are actually at-least-once with deduplication.
⚠️ Reality Check
True exactly-once delivery across a network is theoretically impossible in the general case (related to the Two Generals Problem). What systems actually achieve is "effectively exactly-once" — at-least-once delivery combined with idempotent processing and transactional writes. The message might be delivered twice, but the effect happens exactly once.
Technique: At-least-once + Idempotency + Transactional processing Step 1 — At-least-once delivery: Broker retries until consumer ACKs. No message lost. Step 2 — Deduplication: Each message has a unique ID. Consumer checks: "Have I seen this ID before?" If yes → skip. If no → process. Step 3 — Transactional processing: Process the message AND record the message ID in ONE transaction. BEGIN; INSERT INTO orders (...) VALUES (...); INSERT INTO processed_messages (id) VALUES ('msg-abc-123'); COMMIT; → If the transaction fails, neither the order nor the ID is recorded. → On retry, the ID check prevents reprocessing. Kafka's "exactly-once semantics" (EOS): Producer: idempotent producer (dedup at broker level) Consumer: read-process-write in a Kafka transaction → Offset commit + output write are atomic → If consumer crashes, it resumes from the last committed offset → No duplicate processing (transactional guarantee)
When you need it
- ✅Financial transactions (charge exactly once)
- ✅Inventory updates (decrement stock exactly once)
- ✅Billing systems (invoice generated exactly once)
- ✅Any operation where duplicates cause real harm
The cost
- ❌Higher latency (transactional processing, dedup checks)
- ❌More storage (deduplication table, transaction logs)
- ❌Complex implementation (transactions across systems)
- ❌Lower throughput (coordination overhead)
- ❌Often overkill for non-critical data
🎯 Interview Insight
Don't say "we'll use exactly-once delivery" without explaining how. Say: "True exactly-once is impractical across network boundaries. I'd use at-least-once delivery with idempotent consumers — each message has a unique ID, and the consumer checks for duplicates before processing. For critical operations like payments, I'd wrap the processing and dedup check in a database transaction."
Ordered Delivery with Offsets
In event streaming systems like Kafka, messages are stored in an ordered, append-only log. Each message has an offset — its position in the log. Consumers track their offset to know where they left off and resume from that point.
Kafka Partition (ordered log): Offset: [0] [1] [2] [3] [4] [5] [6] [7] Data: msg0 msg1 msg2 msg3 msg4 msg5 msg6 msg7 Consumer A reads: Start at offset 0 → process msg0 → commit offset 1 Read offset 1 → process msg1 → commit offset 2 Read offset 2 → process msg2 → commit offset 3 (crash!) Restart → resume from last committed offset (3) Read offset 3 → process msg3 → continue... → No messages lost, no messages skipped Consumer B (independent): Start at offset 0 → reads the same messages independently → Multiple consumers can read the same log at different speeds
Ordering Is Partition-Scoped
Topic: "orders" with 4 partitions Partition 0: [order-1] [order-5] [order-9] → ordered within P0 Partition 1: [order-2] [order-6] [order-10] → ordered within P1 Partition 2: [order-3] [order-7] [order-11] → ordered within P2 Partition 3: [order-4] [order-8] [order-12] → ordered within P3 Ordering guarantee: ✅ order-1 is processed before order-5 (same partition) ❌ order-1 might be processed after order-2 (different partitions) How to ensure ordering for a specific entity: Partition key = user_id All orders for user_42 go to the same partition → user_42's orders are always processed in order → Different users' orders can be processed in parallel
Key concepts
- ✅Offset = position in the log (monotonically increasing)
- ✅Consumer commits offset after processing (checkpoint)
- ✅On crash: resume from last committed offset
- ✅Ordering guaranteed within a partition, not across
- ✅Partition key determines which partition a message goes to
Trade-offs
- ❌More partitions = more parallelism but no cross-partition ordering
- ❌Committing offset before processing → at-most-once (data loss on crash)
- ❌Committing offset after processing → at-least-once (duplicates on crash)
- ❌Single partition = strict ordering but no parallelism
🎯 Interview Insight
When asked "how do you ensure message ordering?" — say: "Use a partition key. All messages for the same entity (user, order) go to the same Kafka partition. Within a partition, ordering is guaranteed. Across partitions, there's no ordering — but that's fine because different entities don't need to be ordered relative to each other."
Fan-out to Subscribers
Fan-out delivers one message to multiple consumers. When a user places an order, the order event needs to reach the payment service, inventory service, notification service, and analytics — all independently.
Producer publishes: "OrderPlaced" event ┌──→ Payment Service (charge the card) │ OrderPlaced ──→ Broker ──→ Inventory Service (reduce stock) │ ├──→ Notification Service (send confirmation email) │ └──→ Analytics Service (track conversion) Each subscriber: → Receives the same event independently → Processes at its own pace → Has its own delivery guarantee → Can fail without affecting others Kafka implementation: Topic: "order-events" Consumer Group A: Payment Service (1 instance) Consumer Group B: Inventory Service (3 instances) Consumer Group C: Notification Service (2 instances) Consumer Group D: Analytics Service (5 instances) → Each group gets every message. Within a group, messages are distributed.
📡 Pub/Sub Model
- Publisher sends to a topic, not to specific consumers
- All subscribers to that topic receive the message
- Loose coupling: publisher doesn't know who subscribes
- Examples: Kafka consumer groups, SNS → SQS, Redis Pub/Sub
⚠️ Challenges
- Each subscriber needs its own delivery guarantee
- Slow subscriber doesn't block others (independent consumption)
- Failed subscriber must retry independently
- Ordering per subscriber (not global ordering)
🎯 Interview Insight
Fan-out is the backbone of event-driven architecture. When designing a system, identify events (OrderPlaced, UserSignedUp, PaymentCompleted) and the services that need to react. Each service subscribes independently. This decouples services — adding a new subscriber doesn't require changing the publisher.
End-to-End Scenario
Let's design the delivery guarantees for an order processing system — where every guarantee type plays a role.
🛒 Order Processing — 10K Orders/sec
Services: Payment, Inventory, Notification, Analytics.
Requirements: no lost orders, no double charges, ordered per user.
At-least-once → Order event delivery
Order API publishes 'OrderPlaced' to Kafka with retries enabled. If the publish fails, it retries. No order event is ever lost. Kafka acknowledges after writing to the partition leader + 1 replica (acks=all).
Idempotency → Payment service
Payment service consumes OrderPlaced events. Each order has a unique order_id. Before charging, the service checks: 'Have I already processed order_id=abc-123?' If yes → skip. If no → charge and record the ID. At-least-once delivery + idempotent consumer = effectively exactly-once payment.
Ordered delivery → Per-user ordering
Partition key = user_id. All orders for user_42 go to the same Kafka partition. Within that partition, order-1 is always processed before order-2. This ensures a user's orders are processed sequentially — no race conditions on inventory or balance.
Fan-out → Multiple services
The same OrderPlaced event fans out to 4 consumer groups: Payment (charge card), Inventory (reduce stock), Notification (send email), Analytics (track conversion). Each processes independently. If Notification is slow, Payment isn't affected.
At-most-once → Analytics
Analytics service tracks order counts and revenue. Losing 0.1% of events is acceptable — the dashboard shows approximate numbers anyway. No retry logic, no dedup — simplest consumer. Saves complexity where precision isn't critical.
Order API → Kafka (at-least-once, acks=all) │ │ Topic: "order-events", partition key: user_id │ ├── Payment Service [at-least-once + idempotent] │ → Check dedup table → charge card → record order_id │ ├── Inventory Service [at-least-once + idempotent] │ → Check dedup → decrement stock → record order_id │ ├── Notification Service [at-least-once] │ → Send email (duplicate email is annoying but not harmful) │ └── Analytics Service [at-most-once] → Increment counters (losing 0.1% is fine) Ordering: guaranteed per user (same partition) Fan-out: each service is an independent consumer group Offset tracking: each service commits its own offset
Trade-offs & Decision Making
| Guarantee | Message Loss | Duplicates | Complexity | Latency | Best For |
|---|---|---|---|---|---|
| At-most-once | Possible | No | Very low | Lowest | Metrics, logs, non-critical data |
| At-least-once | No | Possible | Low-medium | Low | Most production systems (+ idempotency) |
| Exactly-once | No | No | High | Higher | Financial transactions, billing |
Choosing the Right Guarantee
| Scenario | Guarantee | Why |
|---|---|---|
| Page view tracking | At-most-once | Losing 0.1% of views doesn't affect decisions |
| Order processing | At-least-once + idempotent | No lost orders; dedup prevents double processing |
| Payment charging | Exactly-once (effective) | Double charge = refund + angry customer + legal risk |
| Email notifications | At-least-once | Duplicate email is annoying but not harmful |
| Inventory decrement | Exactly-once (effective) | Double decrement = overselling |
| Log ingestion | At-most-once | Missing a few log lines is acceptable |
🎯 The Practical Default
At-least-once + idempotent consumers is the right choice for 90% of use cases. It's simple, reliable, and handles duplicates gracefully. Reserve exactly-once for operations where duplicates cause real financial or data integrity harm. Use at-most-once only for truly disposable data.
Interview Questions
Q:At-most-once vs at-least-once — what's the difference?
A: At-most-once: send and forget. No retries. Message might be lost but never duplicated. Fast and simple. Use for metrics and logs. At-least-once: retry until acknowledged. Message is never lost but might be delivered multiple times. Consumer must handle duplicates (idempotency). Use for orders, payments, notifications. The key trade-off: loss vs duplication. Most systems choose 'no loss' (at-least-once) and handle duplicates in the consumer.
Q:Why is exactly-once delivery hard?
A: Because of the Two Generals Problem: you can never be 100% sure the other side received your message. If the consumer processes a message and sends an ACK, but the ACK is lost, the broker retries — causing a duplicate. To prevent this, you need: (1) deduplication IDs on every message, (2) idempotent processing in the consumer, (3) transactional writes (process + record dedup ID atomically). This is complex and adds latency. In practice, 'exactly-once' means 'at-least-once delivery with exactly-once processing.'
Q:How do offsets work in Kafka?
A: Each message in a Kafka partition has a sequential offset (0, 1, 2, ...). A consumer tracks its current offset — the position of the last message it processed. After processing, it commits the offset. If the consumer crashes, it restarts from the last committed offset. Commit before processing → at-most-once (might skip messages on crash). Commit after processing → at-least-once (might reprocess on crash). Ordering is guaranteed within a partition. Use partition keys to ensure related messages go to the same partition.
Q:How do you handle duplicate messages?
A: Make consumers idempotent. Three approaches: (1) Deduplication ID: each message has a unique ID. Consumer checks a 'processed_ids' table before processing. If ID exists → skip. (2) Database constraints: use UNIQUE constraints or ON CONFLICT DO NOTHING to prevent duplicate inserts. (3) Natural idempotency: design operations to be naturally idempotent — 'SET balance = 500' is idempotent, 'ADD 100 to balance' is not. For critical operations (payments), combine all three.
Pitfalls
Assuming exactly-once is easy
Saying 'we'll use exactly-once delivery' in an interview without explaining how. True exactly-once across network boundaries is impossible. What you actually implement is at-least-once + idempotent consumers + transactional processing. Claiming 'exactly-once' without this nuance shows a gap in understanding.
✅Say: 'We'll achieve effectively exactly-once by using at-least-once delivery with idempotent consumers. Each message has a unique ID. The consumer checks for duplicates before processing and wraps the operation in a transaction.' This shows you understand the reality.
Not handling duplicates
Using at-least-once delivery without making consumers idempotent. The broker retries a message, the consumer processes it twice: double payment, double order, double email. The most common production bug in messaging systems.
✅Every consumer that uses at-least-once delivery MUST be idempotent. Use deduplication IDs, database UNIQUE constraints, or naturally idempotent operations. Test by deliberately sending the same message twice and verifying the result is correct.
Ignoring ordering issues
Assuming messages arrive in order across partitions. User creates an account (partition 0) then places an order (partition 1). The order event arrives before the account event → order fails because the user doesn't exist yet.
✅Use partition keys to ensure related messages go to the same partition. All events for user_42 → same partition → processed in order. If cross-entity ordering is needed, use a saga pattern or sequence numbers to detect and handle out-of-order events.
Mismanaging offsets
Committing offsets before processing (at-most-once when you wanted at-least-once). Or never committing offsets (consumer reprocesses everything on restart). Or committing offsets for a batch where some messages failed (skipping failed messages permanently).
✅Commit offsets AFTER successful processing for at-least-once. For batch processing: only commit the offset of the last successfully processed message. Use Kafka's transactional API to atomically commit offsets + write results for exactly-once semantics.