At-Most-OnceAt-Least-OnceExactly-OnceIdempotencyKafka OffsetsFan-outPub/Sub

Delivery Guarantees

Understand message delivery semantics — at-most-once, at-least-once, exactly-once, ordered delivery with offsets, and fan-out to subscribers.

26 min read11 sections

The Big Picture — Why Delivery Is Hard

In a single process, calling a function is reliable — it either succeeds or throws an error. In a distributed system, sending a message across a network is fundamentally unreliable. The network can drop the message, deliver it twice, or deliver it out of order. The acknowledgment can be lost even if the message was delivered. This makes "did the consumer get my message?" a surprisingly hard question.

📦

The Courier Delivery Analogy

You send a package via courier. Three things can happen: (1) The package is lost in transit — you never know if it arrived (at-most-once). (2) You send it with tracking and retry if no confirmation — the package might arrive twice if the first confirmation was lost (at-least-once). (3) You send it with tracking, retry, AND the recipient checks if they already received it before accepting — perfect delivery, but expensive and complex (exactly-once). Every messaging system faces this same trilemma: fast and simple (risk loss), reliable (risk duplicates), or perfect (expensive and complex).

🔥 Key Insight

There is no free lunch. Stronger delivery guarantees cost more in latency, complexity, and throughput. The skill is matching the guarantee to the use case: metrics can tolerate loss, payments cannot tolerate duplicates, and most systems live happily with at-least-once + idempotency.

Where Guarantees Matter

The Messaging Pipelinetext

Producer → Broker (Kafka, SQS, RabbitMQ) → Consumer(s)

Delivery guarantee applies at each hop:
  Producer → Broker:  Did the broker receive the message?
  Broker → Consumer:  Did the consumer process the message?

Failures can happen at any point:
  1. Producer sends, network drops it       → message lost
  2. Broker receives, crashes before storing → message lost
  3. Broker delivers, consumer crashes       → message not processed
  4. Consumer processes, ACK is lost         → broker retries → duplicate
  5. Network delivers message twice          → duplicate processing

📬

Message Queues

SQS, RabbitMQ — task distribution. Each message processed by one consumer. Delivery guarantee determines if tasks can be lost or duplicated.

📜

Event Streaming

Kafka, Kinesis — event log. Multiple consumers read the same stream. Offsets track position. Guarantee determines replay behavior.

📡

Pub/Sub

Redis Pub/Sub, SNS — broadcast. One message to many subscribers. Each subscriber needs its own delivery guarantee.

At-Most-Once Delivery

At-most-once means the message is delivered zero or one time. The producer sends the message and moves on — no retries, no acknowledgment tracking. If the message is lost, it's gone.

At-Most-Once — How It Workstext

Producer → Send message → Done (no retry)

Scenario 1 (success):
  Producer sends → Broker receives → Consumer processes ✅
  Message delivered once. Perfect.

Scenario 2 (failure):
  Producer sends → Network drops message → Lost forever ❌
  No retry. Producer doesn't know it was lost.

Scenario 3 (partial failure):
  Producer sends → Broker receives → Consumer crashes before processing
  Broker doesn't retry → Message effectively lost ❌

Internal mechanism:
  Fire-and-forget. No ACK tracking. No retry logic.
  UDP-style: send and hope for the best.

Strengths

✅Fastest — no ACK waiting, no retry overhead
✅Simplest — no deduplication, no offset tracking
✅Lowest latency — fire and forget
✅No duplicates — message sent at most once

When to use

✅Metrics and monitoring (losing 0.1% of data points is fine)
✅Non-critical logging (a few missing log lines are acceptable)
✅Real-time sensor data (next reading replaces the lost one)
✅Any scenario where loss is cheaper than retry complexity

🎯 Interview Insight

At-most-once is rarely the right choice for business-critical data. But it's perfect for high-volume, low-value data where the cost of loss is negligible and the cost of retries would be prohibitive. Mention it to show you understand the full spectrum of guarantees.

At-Least-Once Delivery

At-least-once means the message is delivered one or more times. The producer retries until it gets an acknowledgment. No message is ever lost — but duplicates are possible if the ACK is lost after successful delivery.

At-Least-Once — How It Workstext

Producer → Send message → Wait for ACK → Retry if no ACK

Scenario 1 (success):
  Producer sends → Broker ACKs → Done ✅
  Message delivered once.

Scenario 2 (retry success):
  Producer sends → Network drops message → No ACK
  Producer retries → Broker receives → ACKs → Done ✅
  Message delivered once (retry worked).

Scenario 3 (duplicate!):
  Producer sends → Broker receives → Processes → Sends ACK
  ACK is lost in network → Producer thinks it failed
  Producer retries → Broker receives AGAIN → Processes AGAIN ⚠️
  Message delivered TWICE. Consumer must handle the duplicate.

Internal mechanism:
  Retry with exponential backoff until ACK received.
  Consumer must ACK after processing (not before).
  If consumer crashes after processing but before ACK → redelivery.

Strengths

✅No data loss — every message is eventually delivered
✅Simple to implement (retry + ACK)
✅Most common guarantee in production systems
✅Works with any message broker (Kafka, SQS, RabbitMQ)

The duplicate problem

❌Consumer might process the same message twice
❌Payment charged twice, order created twice, email sent twice
❌Consumer MUST be idempotent (same input → same result)
❌Use deduplication IDs to detect and skip duplicates

Making Consumers Idempotenttext

Problem: order message delivered twice → two orders created

Solution 1 — Idempotency key:
  Message: { idempotency_key: "order-abc-123", user: 42, total: 99.99 }
  Consumer: 
    if (db.exists("processed_keys", "order-abc-123")) → skip
    else → process order, store "order-abc-123" in processed_keys

Solution 2 — Database constraint:
  INSERT INTO orders (idempotency_key, user_id, total)
  VALUES ('order-abc-123', 42, 99.99)
  ON CONFLICT (idempotency_key) DO NOTHING;
  → Second insert silently ignored. No duplicate order.

Solution 3 — Natural idempotency:
  "SET user balance to $500" is naturally idempotent.
  Running it twice produces the same result.
  "ADD $100 to balance" is NOT idempotent — running twice adds $200.

🎯 Interview Insight

At-least-once + idempotent consumers is the industry standard. When asked about delivery guarantees, say: "I'd use at-least-once delivery with idempotent consumers. Each message carries a unique ID. The consumer checks if it's already processed before executing. This gives us effective exactly-once semantics without the complexity of true exactly-once."

Exactly-Once Delivery

Exactly-once means the message is delivered and processed exactly one time — no loss, no duplicates. It's the holy grail of messaging, and it's extremely hard to achieve in distributed systems. In practice, most "exactly-once" systems are actually at-least-once with deduplication.

⚠️ Reality Check

True exactly-once delivery across a network is theoretically impossible in the general case (related to the Two Generals Problem). What systems actually achieve is "effectively exactly-once" — at-least-once delivery combined with idempotent processing and transactional writes. The message might be delivered twice, but the effect happens exactly once.

Exactly-Once — How It's Achieved in Practicetext

Technique: At-least-once + Idempotency + Transactional processing

Step 1 — At-least-once delivery:
  Broker retries until consumer ACKs. No message lost.

Step 2 — Deduplication:
  Each message has a unique ID.
  Consumer checks: "Have I seen this ID before?"
  If yes → skip. If no → process.

Step 3 — Transactional processing:
  Process the message AND record the message ID in ONE transaction.
  BEGIN;
    INSERT INTO orders (...) VALUES (...);
    INSERT INTO processed_messages (id) VALUES ('msg-abc-123');
  COMMIT;
  → If the transaction fails, neither the order nor the ID is recorded.
  → On retry, the ID check prevents reprocessing.

Kafka's "exactly-once semantics" (EOS):
  Producer: idempotent producer (dedup at broker level)
  Consumer: read-process-write in a Kafka transaction
  → Offset commit + output write are atomic
  → If consumer crashes, it resumes from the last committed offset
  → No duplicate processing (transactional guarantee)

When you need it

✅Financial transactions (charge exactly once)
✅Inventory updates (decrement stock exactly once)
✅Billing systems (invoice generated exactly once)
✅Any operation where duplicates cause real harm

The cost

❌Higher latency (transactional processing, dedup checks)
❌More storage (deduplication table, transaction logs)
❌Complex implementation (transactions across systems)
❌Lower throughput (coordination overhead)
❌Often overkill for non-critical data

🎯 Interview Insight

Don't say "we'll use exactly-once delivery" without explaining how. Say: "True exactly-once is impractical across network boundaries. I'd use at-least-once delivery with idempotent consumers — each message has a unique ID, and the consumer checks for duplicates before processing. For critical operations like payments, I'd wrap the processing and dedup check in a database transaction."

Ordered Delivery with Offsets

In event streaming systems like Kafka, messages are stored in an ordered, append-only log. Each message has an offset — its position in the log. Consumers track their offset to know where they left off and resume from that point.

Offsets — How They Worktext

Kafka Partition (ordered log):
  Offset: [0]  [1]  [2]  [3]  [4]  [5]  [6]  [7]
  Data:   msg0 msg1 msg2 msg3 msg4 msg5 msg6 msg7

Consumer A reads:
  Start at offset 0 → process msg0 → commit offset 1
  Read offset 1 → process msg1 → commit offset 2
  Read offset 2 → process msg2 → commit offset 3
  (crash!)
  
  Restart → resume from last committed offset (3)
  Read offset 3 → process msg3 → continue...
  → No messages lost, no messages skipped

Consumer B (independent):
  Start at offset 0 → reads the same messages independently
  → Multiple consumers can read the same log at different speeds

Ordering Is Partition-Scoped

Ordering Guarantee — Within Partition Onlytext

Topic: "orders" with 4 partitions

Partition 0: [order-1] [order-5] [order-9]   → ordered within P0
Partition 1: [order-2] [order-6] [order-10]  → ordered within P1
Partition 2: [order-3] [order-7] [order-11]  → ordered within P2
Partition 3: [order-4] [order-8] [order-12]  → ordered within P3

Ordering guarantee:
  ✅ order-1 is processed before order-5 (same partition)
  ❌ order-1 might be processed after order-2 (different partitions)

How to ensure ordering for a specific entity:
  Partition key = user_id
  All orders for user_42 go to the same partition
  → user_42's orders are always processed in order
  → Different users' orders can be processed in parallel

Key concepts

✅Offset = position in the log (monotonically increasing)
✅Consumer commits offset after processing (checkpoint)
✅On crash: resume from last committed offset
✅Ordering guaranteed within a partition, not across
✅Partition key determines which partition a message goes to

Trade-offs

❌More partitions = more parallelism but no cross-partition ordering
❌Committing offset before processing → at-most-once (data loss on crash)
❌Committing offset after processing → at-least-once (duplicates on crash)
❌Single partition = strict ordering but no parallelism

🎯 Interview Insight

When asked "how do you ensure message ordering?" — say: "Use a partition key. All messages for the same entity (user, order) go to the same Kafka partition. Within a partition, ordering is guaranteed. Across partitions, there's no ordering — but that's fine because different entities don't need to be ordered relative to each other."

Fan-out to Subscribers

Fan-out delivers one message to multiple consumers. When a user places an order, the order event needs to reach the payment service, inventory service, notification service, and analytics — all independently.

Fan-out — How It Workstext

Producer publishes: "OrderPlaced" event

                    ┌──→ Payment Service    (charge the card)
                    │
  OrderPlaced ──→ Broker ──→ Inventory Service (reduce stock)
                    │
                    ├──→ Notification Service (send confirmation email)
                    │
                    └──→ Analytics Service   (track conversion)

Each subscriber:
  → Receives the same event independently
  → Processes at its own pace
  → Has its own delivery guarantee
  → Can fail without affecting others

Kafka implementation:
  Topic: "order-events"
  Consumer Group A: Payment Service (1 instance)
  Consumer Group B: Inventory Service (3 instances)
  Consumer Group C: Notification Service (2 instances)
  Consumer Group D: Analytics Service (5 instances)
  → Each group gets every message. Within a group, messages are distributed.

📡 Pub/Sub Model

Publisher sends to a topic, not to specific consumers
All subscribers to that topic receive the message
Loose coupling: publisher doesn't know who subscribes
Examples: Kafka consumer groups, SNS → SQS, Redis Pub/Sub

⚠️ Challenges

Each subscriber needs its own delivery guarantee
Slow subscriber doesn't block others (independent consumption)
Failed subscriber must retry independently
Ordering per subscriber (not global ordering)

🎯 Interview Insight

Fan-out is the backbone of event-driven architecture. When designing a system, identify events (OrderPlaced, UserSignedUp, PaymentCompleted) and the services that need to react. Each service subscribes independently. This decouples services — adding a new subscriber doesn't require changing the publisher.

End-to-End Scenario

Let's design the delivery guarantees for an order processing system — where every guarantee type plays a role.

🛒 Order Processing — 10K Orders/sec

Services: Payment, Inventory, Notification, Analytics.

Requirements: no lost orders, no double charges, ordered per user.

At-least-once → Order event delivery

Order API publishes 'OrderPlaced' to Kafka with retries enabled. If the publish fails, it retries. No order event is ever lost. Kafka acknowledges after writing to the partition leader + 1 replica (acks=all).

Idempotency → Payment service

Payment service consumes OrderPlaced events. Each order has a unique order_id. Before charging, the service checks: 'Have I already processed order_id=abc-123?' If yes → skip. If no → charge and record the ID. At-least-once delivery + idempotent consumer = effectively exactly-once payment.

Ordered delivery → Per-user ordering

Partition key = user_id. All orders for user_42 go to the same Kafka partition. Within that partition, order-1 is always processed before order-2. This ensures a user's orders are processed sequentially — no race conditions on inventory or balance.

Fan-out → Multiple services

The same OrderPlaced event fans out to 4 consumer groups: Payment (charge card), Inventory (reduce stock), Notification (send email), Analytics (track conversion). Each processes independently. If Notification is slow, Payment isn't affected.

At-most-once → Analytics

Analytics service tracks order counts and revenue. Losing 0.1% of events is acceptable — the dashboard shows approximate numbers anyway. No retry logic, no dedup — simplest consumer. Saves complexity where precision isn't critical.

Architecture — Delivery Guarantees Per Servicetext

Order API → Kafka (at-least-once, acks=all)
  │
  │ Topic: "order-events", partition key: user_id
  │
  ├── Payment Service     [at-least-once + idempotent]
  │   → Check dedup table → charge card → record order_id
  │
  ├── Inventory Service   [at-least-once + idempotent]
  │   → Check dedup → decrement stock → record order_id
  │
  ├── Notification Service [at-least-once]
  │   → Send email (duplicate email is annoying but not harmful)
  │
  └── Analytics Service    [at-most-once]
      → Increment counters (losing 0.1% is fine)

Ordering: guaranteed per user (same partition)
Fan-out: each service is an independent consumer group
Offset tracking: each service commits its own offset

Trade-offs & Decision Making

Guarantee	Message Loss	Duplicates	Complexity	Latency	Best For
At-most-once	Possible	No	Very low	Lowest	Metrics, logs, non-critical data
At-least-once	No	Possible	Low-medium	Low	Most production systems (+ idempotency)
Exactly-once	No	No	High	Higher	Financial transactions, billing

Choosing the Right Guarantee

Scenario	Guarantee	Why
Page view tracking	At-most-once	Losing 0.1% of views doesn't affect decisions
Order processing	At-least-once + idempotent	No lost orders; dedup prevents double processing
Payment charging	Exactly-once (effective)	Double charge = refund + angry customer + legal risk
Email notifications	At-least-once	Duplicate email is annoying but not harmful
Inventory decrement	Exactly-once (effective)	Double decrement = overselling
Log ingestion	At-most-once	Missing a few log lines is acceptable

🎯 The Practical Default

At-least-once + idempotent consumers is the right choice for 90% of use cases. It's simple, reliable, and handles duplicates gracefully. Reserve exactly-once for operations where duplicates cause real financial or data integrity harm. Use at-most-once only for truly disposable data.

Interview Questions

Q:At-most-once vs at-least-once — what's the difference?

A: At-most-once: send and forget. No retries. Message might be lost but never duplicated. Fast and simple. Use for metrics and logs. At-least-once: retry until acknowledged. Message is never lost but might be delivered multiple times. Consumer must handle duplicates (idempotency). Use for orders, payments, notifications. The key trade-off: loss vs duplication. Most systems choose 'no loss' (at-least-once) and handle duplicates in the consumer.

Q:Why is exactly-once delivery hard?

A: Because of the Two Generals Problem: you can never be 100% sure the other side received your message. If the consumer processes a message and sends an ACK, but the ACK is lost, the broker retries — causing a duplicate. To prevent this, you need: (1) deduplication IDs on every message, (2) idempotent processing in the consumer, (3) transactional writes (process + record dedup ID atomically). This is complex and adds latency. In practice, 'exactly-once' means 'at-least-once delivery with exactly-once processing.'

Q:How do offsets work in Kafka?

A: Each message in a Kafka partition has a sequential offset (0, 1, 2, ...). A consumer tracks its current offset — the position of the last message it processed. After processing, it commits the offset. If the consumer crashes, it restarts from the last committed offset. Commit before processing → at-most-once (might skip messages on crash). Commit after processing → at-least-once (might reprocess on crash). Ordering is guaranteed within a partition. Use partition keys to ensure related messages go to the same partition.

Q:How do you handle duplicate messages?

A: Make consumers idempotent. Three approaches: (1) Deduplication ID: each message has a unique ID. Consumer checks a 'processed_ids' table before processing. If ID exists → skip. (2) Database constraints: use UNIQUE constraints or ON CONFLICT DO NOTHING to prevent duplicate inserts. (3) Natural idempotency: design operations to be naturally idempotent — 'SET balance = 500' is idempotent, 'ADD 100 to balance' is not. For critical operations (payments), combine all three.

Pitfalls

🎯

Assuming exactly-once is easy

Saying 'we'll use exactly-once delivery' in an interview without explaining how. True exactly-once across network boundaries is impossible. What you actually implement is at-least-once + idempotent consumers + transactional processing. Claiming 'exactly-once' without this nuance shows a gap in understanding.

✅Say: 'We'll achieve effectively exactly-once by using at-least-once delivery with idempotent consumers. Each message has a unique ID. The consumer checks for duplicates before processing and wraps the operation in a transaction.' This shows you understand the reality.

🔁

Not handling duplicates

Using at-least-once delivery without making consumers idempotent. The broker retries a message, the consumer processes it twice: double payment, double order, double email. The most common production bug in messaging systems.

✅Every consumer that uses at-least-once delivery MUST be idempotent. Use deduplication IDs, database UNIQUE constraints, or naturally idempotent operations. Test by deliberately sending the same message twice and verifying the result is correct.

🔀

Ignoring ordering issues

Assuming messages arrive in order across partitions. User creates an account (partition 0) then places an order (partition 1). The order event arrives before the account event → order fails because the user doesn't exist yet.

✅Use partition keys to ensure related messages go to the same partition. All events for user_42 → same partition → processed in order. If cross-entity ordering is needed, use a saga pattern or sequence numbers to detect and handle out-of-order events.

📍

Mismanaging offsets

Committing offsets before processing (at-most-once when you wanted at-least-once). Or never committing offsets (consumer reprocesses everything on restart). Or committing offsets for a batch where some messages failed (skipping failed messages permanently).

✅Commit offsets AFTER successful processing for at-least-once. For batch processing: only commit the offset of the last successfully processed message. Use Kafka's transactional API to atomically commit offsets + write results for exactly-once semantics.

Delivery Guarantees

Table of Contents

The Big Picture — Why Delivery Is Hard

The Courier Delivery Analogy

Where Guarantees Matter

Message Queues

Event Streaming

Pub/Sub

At-Most-Once Delivery

Strengths

When to use

At-Least-Once Delivery

Strengths

The duplicate problem

Exactly-Once Delivery

When you need it

The cost

Ordered Delivery with Offsets

Ordering Is Partition-Scoped

Key concepts

Trade-offs

Fan-out to Subscribers

📡 Pub/Sub Model

⚠️ Challenges

End-to-End Scenario

🛒 Order Processing — 10K Orders/sec

At-least-once → Order event delivery

Idempotency → Payment service

Ordered delivery → Per-user ordering

Fan-out → Multiple services

At-most-once → Analytics

Trade-offs & Decision Making

Choosing the Right Guarantee

Interview Questions

Q:At-most-once vs at-least-once — what's the difference?

Q:Why is exactly-once delivery hard?

Q:How do offsets work in Kafka?

Q:How do you handle duplicate messages?

Pitfalls

Assuming exactly-once is easy

Not handling duplicates

Ignoring ordering issues

Mismanaging offsets