azure-sdk-for-net: ServiceBusReceiver stucks

Library name and version

Azure.Messaging.ServiceBus 7.8.1

Query/Question

I read messages through ServiceBusReceiver:

var messages = await _receiver.ReceiveMessagesAsync(_config.MaxMessages, _config.MaxWaitTimeout);

I want to read messages by batches and you can see above that I use relevant parameters while ReceiveMessagesAsync call. In my case “MaxMessages” = 500, “MaxWaitTimeout” = “00:00:03”.

It works until some time. Few times a receiver has stuck there. I have a log next to the line above:

_logger.LogDebug(
	"Topic {topic}, subscription {subscription}, have read {count} messages, maxMessages {maxMessages}; it took {totalMilliseconds}, maxWaitTimeout {maxWaitTimeout}",
	_config.Topic, _subscription, messages.Count, _config.MaxMessages, stopwatch.Elapsed.TotalMilliseconds, _config.MaxWaitTimeout
);

When the receiver is stuck I see in the logs that it received 0 messages for 3 seconds, although there are messages in the subscription. Here is a graph of Active messages in topic:

image

You can see, starting from some point the number of messages is increased. Only after I restart the app, receiver starts to receive messages. From the logs and other metrics I don’t see any anomalies.

What can cause this stuck and how can I fix it?

Environment

Azure AppService on Linux

Host: Version: 6.0.12 Architecture: x64 Commit: 02e45a41b7

.NET SDKs installed: No SDKs were found.

.NET runtimes installed: Microsoft.AspNetCore.App 6.0.12 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 6.0.12 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

Thanks for the update, @verdysh. We’ll leave this in a pending state for a bit longer to see if things repro.

Looking at logs there are some errors and warnings. Most are warnings, only a little are errors. Since I updated the package, there has been no processing stuck. I continue to observe.

Removing stale tagging; log capture is still ongoing for an intermittent issue.

As an FYI, there is a recent service fix that is being rolled out over the next few weeks that may address this issue.

Now I’m observing, this issue had occurred once per few weeks.

I’ve updated Azure.Messaging.ServiceBus to version 7.12.0 and created the next logger class:

public class AzureEventSourceLogger
{
	private readonly int[] _eventIds = new[] { 7, 8, 9, 37, 38, 39, 40, 41, 43, 101, 114 };

	private readonly ILogger<AzureEventSourceLogger> _logger;
	private AzureEventSourceListener _azureEventSourceListener;

	public AzureEventSourceLogger(ILogger<AzureEventSourceLogger> logger)
	{
		_logger = logger;
		TurnOn();
	}

	public void TurnOn()
	{
		if (_azureEventSourceListener is not null)
			return;

		_azureEventSourceListener = new AzureEventSourceListener((args, message) =>
		{
			if (!(args.Level == EventLevel.Error || args.Level == EventLevel.Warning))
				return;

			if (!_eventIds.Contains(args.EventId))
				return;

			if(args.Level == EventLevel.Error)
				_logger.LogError(message);
			else if (args.Level == EventLevel.Warning)
				_logger.LogWarning(message);

		}, EventLevel.LogAlways);
	}

	public void TurnOff()
	{
		if (_azureEventSourceListener is null)
			return;

		_azureEventSourceListener.Dispose();
		_azureEventSourceListener = null;
	}
}

I will watch and let you know if the receiver gets stuck or if I see anything interesting in the logs.

Thank you for your time!

It should be good for getting started. My working assumption is that the receive loop is still in a good state, it’s not getting messages returned. We want to see what’s eating up the time.

One thing that I should have noticed earlier - you’re using a very old version of the library that is missing some key fixes. On note is a bump to our AMPQ transport dependency which contains a fix that resolves a connection getting into an undetectable bad state.

Before we capture logs, we should bump the version of the Service Bus package that you’re using to the latest.

We need only a 5 minute slice of logs, and it would be best if we could collect all Verbose logs for that interval so that we can understand the client perspective. If you’re worried about volume in AppInsights, I’d suggest disposing the listener after that time, which will stop log collection even if your application is still running. If you feel like you must filter, the most important things we’re after are:

  • All warnings
  • All errors
  • Id: 7 (ReceiveMessageStart)
  • Id: 8 (ReceiveMessageComplete)
  • Id: 9 (ReceiveMessageError)
  • Id: 37 (LinkStateLost)
  • Id: 38 (ReceiveLinkClosed)
  • Id: 39 (AmqpLinkRefreshStart)
  • Id: 40 (AmqpLinkRefreshComplete)
  • Id: 41 (AmqpLinkRefreshError)
  • Id: 43 (RunOperationError)
  • Id: 101 (ManagementLinkClosed)
  • Id: 114 (RunOperationVerboseError)

When using the Extensions package, the listener is automatically attached for all Azure SDK libraries and forwards to the ILogger instance. To set forwarding without using the client registration for DI, you’d so something like:

public void ConfigureServices(IServiceCollection services)
{
    // ... <SNIP> ...
    services.AddAzureClientsCore(enableLogForwarding: true);
    // ... <SNIP> ...
}