At-Most-OnceAt-Least-OnceExactly-OnceIdempotencyKafka OffsetsFan-outPub/Sub

Delivery Guarantees

Understand message delivery semantics — at-most-once, at-least-once, exactly-once, ordered delivery with offsets, and fan-out to subscribers.

26 min read11 sections
01

The Big Picture — Why Delivery Is Hard

In a single process, calling a function is reliable — it either succeeds or throws an error. In a distributed system, sending a message across a network is fundamentally unreliable. The network can drop the message, deliver it twice, or deliver it out of order. The acknowledgment can be lost even if the message was delivered. This makes "did the consumer get my message?" a surprisingly hard question.

📦

The Courier Delivery Analogy

You send a package via courier. Three things can happen: (1) The package is lost in transit — you never know if it arrived (at-most-once). (2) You send it with tracking and retry if no confirmation — the package might arrive twice if the first confirmation was lost (at-least-once). (3) You send it with tracking, retry, AND the recipient checks if they already received it before accepting — perfect delivery, but expensive and complex (exactly-once). Every messaging system faces this same trilemma: fast and simple (risk loss), reliable (risk duplicates), or perfect (expensive and complex).

🔥 Key Insight

There is no free lunch. Stronger delivery guarantees cost more in latency, complexity, and throughput. The skill is matching the guarantee to the use case: metrics can tolerate loss, payments cannot tolerate duplicates, and most systems live happily with at-least-once + idempotency.

02

Where Guarantees Matter

The Messaging Pipelinetext
ProducerBroker (Kafka, SQS, RabbitMQ) → Consumer(s)

Delivery guarantee applies at each hop:
  ProducerBroker:  Did the broker receive the message?
  BrokerConsumer:  Did the consumer process the message?

Failures can happen at any point:
  1. Producer sends, network drops itmessage lost
  2. Broker receives, crashes before storingmessage lost
  3. Broker delivers, consumer crashesmessage not processed
  4. Consumer processes, ACK is lostbroker retriesduplicate
  5. Network delivers message twiceduplicate processing
📬

Message Queues

SQS, RabbitMQ — task distribution. Each message processed by one consumer. Delivery guarantee determines if tasks can be lost or duplicated.

📜

Event Streaming

Kafka, Kinesis — event log. Multiple consumers read the same stream. Offsets track position. Guarantee determines replay behavior.

📡

Pub/Sub

Redis Pub/Sub, SNS — broadcast. One message to many subscribers. Each subscriber needs its own delivery guarantee.

03

At-Most-Once Delivery

At-most-once means the message is delivered zero or one time. The producer sends the message and moves on — no retries, no acknowledgment tracking. If the message is lost, it's gone.

At-Most-Once — How It Workstext
ProducerSend messageDone (no retry)

Scenario 1 (success):
  Producer sendsBroker receivesConsumer processes
  Message delivered once. Perfect.

Scenario 2 (failure):
  Producer sendsNetwork drops messageLost forever
  No retry. Producer doesn't know it was lost.

Scenario 3 (partial failure):
  Producer sendsBroker receivesConsumer crashes before processing
  Broker doesn't retry → Message effectively lost 

Internal mechanism:
  Fire-and-forget. No ACK tracking. No retry logic.
  UDP-style: send and hope for the best.

Strengths

  • Fastest — no ACK waiting, no retry overhead
  • Simplest — no deduplication, no offset tracking
  • Lowest latency — fire and forget
  • No duplicates — message sent at most once

When to use

  • Metrics and monitoring (losing 0.1% of data points is fine)
  • Non-critical logging (a few missing log lines are acceptable)
  • Real-time sensor data (next reading replaces the lost one)
  • Any scenario where loss is cheaper than retry complexity

🎯 Interview Insight

At-most-once is rarely the right choice for business-critical data. But it's perfect for high-volume, low-value data where the cost of loss is negligible and the cost of retries would be prohibitive. Mention it to show you understand the full spectrum of guarantees.

04

At-Least-Once Delivery

At-least-once means the message is delivered one or more times. The producer retries until it gets an acknowledgment. No message is ever lost — but duplicates are possible if the ACK is lost after successful delivery.

At-Least-Once — How It Workstext
ProducerSend messageWait for ACKRetry if no ACK

Scenario 1 (success):
  Producer sendsBroker ACKsDone
  Message delivered once.

Scenario 2 (retry success):
  Producer sendsNetwork drops messageNo ACK
  Producer retriesBroker receivesACKsDone
  Message delivered once (retry worked).

Scenario 3 (duplicate!):
  Producer sendsBroker receivesProcessesSends ACK
  ACK is lost in networkProducer thinks it failed
  Producer retriesBroker receives AGAINProcesses AGAIN ⚠️
  Message delivered TWICE. Consumer must handle the duplicate.

Internal mechanism:
  Retry with exponential backoff until ACK received.
  Consumer must ACK after processing (not before).
  If consumer crashes after processing but before ACKredelivery.

Strengths

  • No data loss — every message is eventually delivered
  • Simple to implement (retry + ACK)
  • Most common guarantee in production systems
  • Works with any message broker (Kafka, SQS, RabbitMQ)

The duplicate problem

  • Consumer might process the same message twice
  • Payment charged twice, order created twice, email sent twice
  • Consumer MUST be idempotent (same input → same result)
  • Use deduplication IDs to detect and skip duplicates
Making Consumers Idempotenttext
Problem: order message delivered twicetwo orders created

Solution 1Idempotency key:
  Message: { idempotency_key: "order-abc-123", user: 42, total: 99.99 }
  Consumer: 
    if (db.exists("processed_keys", "order-abc-123")) → skip
    elseprocess order, store "order-abc-123" in processed_keys

Solution 2Database constraint:
  INSERT INTO orders (idempotency_key, user_id, total)
  VALUES ('order-abc-123', 42, 99.99)
  ON CONFLICT (idempotency_key) DO NOTHING;
Second insert silently ignored. No duplicate order.

Solution 3Natural idempotency:
  "SET user balance to $500" is naturally idempotent.
  Running it twice produces the same result.
  "ADD $100 to balance" is NOT idempotentrunning twice adds $200.

🎯 Interview Insight

At-least-once + idempotent consumers is the industry standard. When asked about delivery guarantees, say: "I'd use at-least-once delivery with idempotent consumers. Each message carries a unique ID. The consumer checks if it's already processed before executing. This gives us effective exactly-once semantics without the complexity of true exactly-once."

05

Exactly-Once Delivery

Exactly-once means the message is delivered and processed exactly one time — no loss, no duplicates. It's the holy grail of messaging, and it's extremely hard to achieve in distributed systems. In practice, most "exactly-once" systems are actually at-least-once with deduplication.

⚠️ Reality Check

True exactly-once delivery across a network is theoretically impossible in the general case (related to the Two Generals Problem). What systems actually achieve is "effectively exactly-once" — at-least-once delivery combined with idempotent processing and transactional writes. The message might be delivered twice, but the effect happens exactly once.

Exactly-Once — How It's Achieved in Practicetext
Technique: At-least-once + Idempotency + Transactional processing

Step 1At-least-once delivery:
  Broker retries until consumer ACKs. No message lost.

Step 2Deduplication:
  Each message has a unique ID.
  Consumer checks: "Have I seen this ID before?"
  If yesskip. If noprocess.

Step 3Transactional processing:
  Process the message AND record the message ID in ONE transaction.
  BEGIN;
    INSERT INTO orders (...) VALUES (...);
    INSERT INTO processed_messages (id) VALUES ('msg-abc-123');
  COMMIT;
If the transaction fails, neither the order nor the ID is recorded.
On retry, the ID check prevents reprocessing.

Kafka's "exactly-once semantics" (EOS):
  Producer: idempotent producer (dedup at broker level)
  Consumer: read-process-write in a Kafka transaction
Offset commit + output write are atomic
If consumer crashes, it resumes from the last committed offset
No duplicate processing (transactional guarantee)

When you need it

  • Financial transactions (charge exactly once)
  • Inventory updates (decrement stock exactly once)
  • Billing systems (invoice generated exactly once)
  • Any operation where duplicates cause real harm

The cost

  • Higher latency (transactional processing, dedup checks)
  • More storage (deduplication table, transaction logs)
  • Complex implementation (transactions across systems)
  • Lower throughput (coordination overhead)
  • Often overkill for non-critical data

🎯 Interview Insight

Don't say "we'll use exactly-once delivery" without explaining how. Say: "True exactly-once is impractical across network boundaries. I'd use at-least-once delivery with idempotent consumers — each message has a unique ID, and the consumer checks for duplicates before processing. For critical operations like payments, I'd wrap the processing and dedup check in a database transaction."

06

Ordered Delivery with Offsets

In event streaming systems like Kafka, messages are stored in an ordered, append-only log. Each message has an offset — its position in the log. Consumers track their offset to know where they left off and resume from that point.

Offsets — How They Worktext
Kafka Partition (ordered log):
  Offset: [0]  [1]  [2]  [3]  [4]  [5]  [6]  [7]
  Data:   msg0 msg1 msg2 msg3 msg4 msg5 msg6 msg7

Consumer A reads:
  Start at offset 0process msg0commit offset 1
  Read offset 1process msg1commit offset 2
  Read offset 2process msg2commit offset 3
  (crash!)
  
  Restartresume from last committed offset (3)
  Read offset 3process msg3continue...
No messages lost, no messages skipped

Consumer B (independent):
  Start at offset 0reads the same messages independently
Multiple consumers can read the same log at different speeds

Ordering Is Partition-Scoped

Ordering Guarantee — Within Partition Onlytext
Topic: "orders" with 4 partitions

Partition 0: [order-1] [order-5] [order-9]   → ordered within P0
Partition 1: [order-2] [order-6] [order-10]  → ordered within P1
Partition 2: [order-3] [order-7] [order-11]  → ordered within P2
Partition 3: [order-4] [order-8] [order-12]  → ordered within P3

Ordering guarantee:
order-1 is processed before order-5 (same partition)
order-1 might be processed after order-2 (different partitions)

How to ensure ordering for a specific entity:
  Partition key = user_id
  All orders for user_42 go to the same partition
user_42's orders are always processed in order
Different users' orders can be processed in parallel

Key concepts

  • Offset = position in the log (monotonically increasing)
  • Consumer commits offset after processing (checkpoint)
  • On crash: resume from last committed offset
  • Ordering guaranteed within a partition, not across
  • Partition key determines which partition a message goes to

Trade-offs

  • More partitions = more parallelism but no cross-partition ordering
  • Committing offset before processing → at-most-once (data loss on crash)
  • Committing offset after processing → at-least-once (duplicates on crash)
  • Single partition = strict ordering but no parallelism

🎯 Interview Insight

When asked "how do you ensure message ordering?" — say: "Use a partition key. All messages for the same entity (user, order) go to the same Kafka partition. Within a partition, ordering is guaranteed. Across partitions, there's no ordering — but that's fine because different entities don't need to be ordered relative to each other."

07

Fan-out to Subscribers

Fan-out delivers one message to multiple consumers. When a user places an order, the order event needs to reach the payment service, inventory service, notification service, and analytics — all independently.

Fan-out — How It Workstext
Producer publishes: "OrderPlaced" event

                    ┌──→ Payment Service    (charge the card)

  OrderPlaced ──→ Broker ──→ Inventory Service (reduce stock)

                    ├──→ Notification Service (send confirmation email)

                    └──→ Analytics Service   (track conversion)

Each subscriber:
Receives the same event independently
Processes at its own pace
Has its own delivery guarantee
Can fail without affecting others

Kafka implementation:
  Topic: "order-events"
  Consumer Group A: Payment Service (1 instance)
  Consumer Group B: Inventory Service (3 instances)
  Consumer Group C: Notification Service (2 instances)
  Consumer Group D: Analytics Service (5 instances)
Each group gets every message. Within a group, messages are distributed.

📡 Pub/Sub Model

  • Publisher sends to a topic, not to specific consumers
  • All subscribers to that topic receive the message
  • Loose coupling: publisher doesn't know who subscribes
  • Examples: Kafka consumer groups, SNS → SQS, Redis Pub/Sub

⚠️ Challenges

  • Each subscriber needs its own delivery guarantee
  • Slow subscriber doesn't block others (independent consumption)
  • Failed subscriber must retry independently
  • Ordering per subscriber (not global ordering)

🎯 Interview Insight

Fan-out is the backbone of event-driven architecture. When designing a system, identify events (OrderPlaced, UserSignedUp, PaymentCompleted) and the services that need to react. Each service subscribes independently. This decouples services — adding a new subscriber doesn't require changing the publisher.

08

End-to-End Scenario

Let's design the delivery guarantees for an order processing system — where every guarantee type plays a role.

🛒 Order Processing — 10K Orders/sec

Services: Payment, Inventory, Notification, Analytics.

Requirements: no lost orders, no double charges, ordered per user.

1

At-least-once → Order event delivery

Order API publishes 'OrderPlaced' to Kafka with retries enabled. If the publish fails, it retries. No order event is ever lost. Kafka acknowledges after writing to the partition leader + 1 replica (acks=all).

2

Idempotency → Payment service

Payment service consumes OrderPlaced events. Each order has a unique order_id. Before charging, the service checks: 'Have I already processed order_id=abc-123?' If yes → skip. If no → charge and record the ID. At-least-once delivery + idempotent consumer = effectively exactly-once payment.

3

Ordered delivery → Per-user ordering

Partition key = user_id. All orders for user_42 go to the same Kafka partition. Within that partition, order-1 is always processed before order-2. This ensures a user's orders are processed sequentially — no race conditions on inventory or balance.

4

Fan-out → Multiple services

The same OrderPlaced event fans out to 4 consumer groups: Payment (charge card), Inventory (reduce stock), Notification (send email), Analytics (track conversion). Each processes independently. If Notification is slow, Payment isn't affected.

5

At-most-once → Analytics

Analytics service tracks order counts and revenue. Losing 0.1% of events is acceptable — the dashboard shows approximate numbers anyway. No retry logic, no dedup — simplest consumer. Saves complexity where precision isn't critical.

Architecture — Delivery Guarantees Per Servicetext
Order APIKafka (at-least-once, acks=all)

Topic: "order-events", partition key: user_id

  ├── Payment Service     [at-least-once + idempotent]
  │   → Check dedup tablecharge cardrecord order_id

  ├── Inventory Service   [at-least-once + idempotent]
  │   → Check dedupdecrement stockrecord order_id

  ├── Notification Service [at-least-once]
  │   → Send email (duplicate email is annoying but not harmful)

  └── Analytics Service    [at-most-once]
Increment counters (losing 0.1% is fine)

Ordering: guaranteed per user (same partition)
Fan-out: each service is an independent consumer group
Offset tracking: each service commits its own offset
09

Trade-offs & Decision Making

GuaranteeMessage LossDuplicatesComplexityLatencyBest For
At-most-oncePossibleNoVery lowLowestMetrics, logs, non-critical data
At-least-onceNoPossibleLow-mediumLowMost production systems (+ idempotency)
Exactly-onceNoNoHighHigherFinancial transactions, billing

Choosing the Right Guarantee

ScenarioGuaranteeWhy
Page view trackingAt-most-onceLosing 0.1% of views doesn't affect decisions
Order processingAt-least-once + idempotentNo lost orders; dedup prevents double processing
Payment chargingExactly-once (effective)Double charge = refund + angry customer + legal risk
Email notificationsAt-least-onceDuplicate email is annoying but not harmful
Inventory decrementExactly-once (effective)Double decrement = overselling
Log ingestionAt-most-onceMissing a few log lines is acceptable

🎯 The Practical Default

At-least-once + idempotent consumers is the right choice for 90% of use cases. It's simple, reliable, and handles duplicates gracefully. Reserve exactly-once for operations where duplicates cause real financial or data integrity harm. Use at-most-once only for truly disposable data.

10

Interview Questions

Q:At-most-once vs at-least-once — what's the difference?

A: At-most-once: send and forget. No retries. Message might be lost but never duplicated. Fast and simple. Use for metrics and logs. At-least-once: retry until acknowledged. Message is never lost but might be delivered multiple times. Consumer must handle duplicates (idempotency). Use for orders, payments, notifications. The key trade-off: loss vs duplication. Most systems choose 'no loss' (at-least-once) and handle duplicates in the consumer.

Q:Why is exactly-once delivery hard?

A: Because of the Two Generals Problem: you can never be 100% sure the other side received your message. If the consumer processes a message and sends an ACK, but the ACK is lost, the broker retries — causing a duplicate. To prevent this, you need: (1) deduplication IDs on every message, (2) idempotent processing in the consumer, (3) transactional writes (process + record dedup ID atomically). This is complex and adds latency. In practice, 'exactly-once' means 'at-least-once delivery with exactly-once processing.'

Q:How do offsets work in Kafka?

A: Each message in a Kafka partition has a sequential offset (0, 1, 2, ...). A consumer tracks its current offset — the position of the last message it processed. After processing, it commits the offset. If the consumer crashes, it restarts from the last committed offset. Commit before processing → at-most-once (might skip messages on crash). Commit after processing → at-least-once (might reprocess on crash). Ordering is guaranteed within a partition. Use partition keys to ensure related messages go to the same partition.

Q:How do you handle duplicate messages?

A: Make consumers idempotent. Three approaches: (1) Deduplication ID: each message has a unique ID. Consumer checks a 'processed_ids' table before processing. If ID exists → skip. (2) Database constraints: use UNIQUE constraints or ON CONFLICT DO NOTHING to prevent duplicate inserts. (3) Natural idempotency: design operations to be naturally idempotent — 'SET balance = 500' is idempotent, 'ADD 100 to balance' is not. For critical operations (payments), combine all three.

11

Pitfalls

🎯

Assuming exactly-once is easy

Saying 'we'll use exactly-once delivery' in an interview without explaining how. True exactly-once across network boundaries is impossible. What you actually implement is at-least-once + idempotent consumers + transactional processing. Claiming 'exactly-once' without this nuance shows a gap in understanding.

Say: 'We'll achieve effectively exactly-once by using at-least-once delivery with idempotent consumers. Each message has a unique ID. The consumer checks for duplicates before processing and wraps the operation in a transaction.' This shows you understand the reality.

🔁

Not handling duplicates

Using at-least-once delivery without making consumers idempotent. The broker retries a message, the consumer processes it twice: double payment, double order, double email. The most common production bug in messaging systems.

Every consumer that uses at-least-once delivery MUST be idempotent. Use deduplication IDs, database UNIQUE constraints, or naturally idempotent operations. Test by deliberately sending the same message twice and verifying the result is correct.

🔀

Ignoring ordering issues

Assuming messages arrive in order across partitions. User creates an account (partition 0) then places an order (partition 1). The order event arrives before the account event → order fails because the user doesn't exist yet.

Use partition keys to ensure related messages go to the same partition. All events for user_42 → same partition → processed in order. If cross-entity ordering is needed, use a saga pattern or sequence numbers to detect and handle out-of-order events.

📍

Mismanaging offsets

Committing offsets before processing (at-most-once when you wanted at-least-once). Or never committing offsets (consumer reprocesses everything on restart). Or committing offsets for a batch where some messages failed (skipping failed messages permanently).

Commit offsets AFTER successful processing for at-least-once. For batch processing: only commit the offset of the last successfully processed message. Use Kafka's transactional API to atomically commit offsets + write results for exactly-once semantics.