Building Resilient Event Pipelines with Azure Service Bus
When building distributed systems on Azure, handling message failures gracefully isn't optional — it's essential. In this post, I'll walk through the patterns and practices I've used to build resilient event pipelines with Azure Service Bus, based on real production systems processing millions of messages per day.
The Problem With Naive Messaging
Most tutorials show you how to send and receive messages. Few show you what happens when things go wrong — and in distributed systems, things *will* go wrong.
// this looks fine until your handler throws
await processor.ProcessMessageAsync(async args =>
{
var order = args.Message.Body.ToObjectFromJson<OrderEvent>();
await _orderService.Process(order);
await args.CompleteMessageAsync(args.Message);
});The moment `_orderService.Process` throws, you need a strategy. Do you retry? How many times? What if the message itself is malformed?
Retry Patterns That Actually Work
Azure Service Bus has built-in retry via the `MaxDeliveryCount` property on queues and subscriptions. But the default of 10 retries with no backoff is rarely what you want.
Deferred Retry With Scheduled Messages
For transient failures (network blips, temporary downstream outages), I use scheduled message re-enqueue:
public async Task HandleWithRetry(ProcessMessageEventArgs args)
{
var attempt = args.Message.DeliveryCount;
try
{
await ProcessMessage(args.Message);
await args.CompleteMessageAsync(args.Message);
}
catch (TransientException) when (attempt < 5)
{
var delay = TimeSpan.FromSeconds(Math.Pow(2, attempt));
var clone = new ServiceBusMessage(args.Message);
await _sender.ScheduleMessageAsync(clone,
DateTimeOffset.UtcNow.Add(delay));
await args.CompleteMessageAsync(args.Message);
}
catch (Exception)
{
await args.AbandonMessageAsync(args.Message);
}
}The key insight: complete the original message and schedule a new one. This gives you exponential backoff without blocking the processor.
Dead-Letter Queues Are Your Safety Net
Every queue in Service Bus has a dead-letter sub-queue. Messages that exceed `MaxDeliveryCount` land here automatically. But you should also dead-letter explicitly when a message is permanently invalid:
catch (ValidationException ex)
{
await args.DeadLetterMessageAsync(args.Message,
deadLetterReason: "ValidationFailed",
deadLetterErrorDescription: ex.Message);
}Processing Dead Letters
Don't just let dead letters pile up. Build a dead-letter processor that:
- Alerts on new dead-letter messages via Application Insights
- Provides a dashboard for manual inspection
- Supports replaying messages after fixes are deployed
Monitoring Your Pipelines
Without observability, you're flying blind. I instrument every pipeline with:
- **Message throughput** — messages processed per minute per topic
- **Processing latency** — time from enqueue to completion
- **Failure rate** — percentage of messages that need retry
- **Dead-letter depth** — number of messages in DLQ per queue
using var activity = ActivitySource.StartActivity("ProcessMessage");
activity?.SetTag("messaging.system", "servicebus");
activity?.SetTag("messaging.destination", topicName);
var stopwatch = Stopwatch.StartNew();
await ProcessMessage(message);
stopwatch.Stop();
_telemetry.TrackMetric("message.processing.duration",
stopwatch.ElapsedMilliseconds);Key Takeaways
- Never rely on default retry settings — configure `MaxDeliveryCount` and implement backoff
- Use scheduled messages for deferred retry instead of `Thread.Sleep`
- Dead-letter explicitly for poison messages, don't wait for max delivery count
- Instrument everything — you can't fix what you can't see
- Test failure scenarios in staging before they surprise you in production
Share this article
About the Author
Georg is a senior solution architect specializing in .NET, Azure, and Dynamics 365. He helps organizations design and build scalable, maintainable enterprise systems. When he's not writing code, he's writing about it here.
Learn more about Georg