azure-sdk-for-net: Azure service bus .NET sdk raises OperationCanceledException from time to time

Hi all,

I’m working with a .NET core 3.1 worker service application. We are referencing the Microsoft.Azure.ServiceBus nuget package version 5.1.0. The application is deployed as a pod inside a kubernetes cluster. We use Azure Kubernetes Service (AKS) to manage our cluster.

This application is meant to listen to messages published to an Azure Service Bus topic by another application. When the application starts up a subscription is created on the existing Azure Service Bus topic and a SubscriptionClient instance is created in order to listen to incoming messages.

Our subscription client is configured as follows:

  • the receive mode is ReceiveMode.ReceiveAndDelete
  • the retry policy is RetryPolicy.Default
  • we register a message handler by using the SubscriptionClient.RegisterMessageHandler method
  • we set AutoComplete = true in the MessageHandlerOptions
  • we set a default value of 10 for the MaxConcurrentCalls property in the MessageHandlerOptions
  • we provide an exceptionReceivedHandler in the constructor of class MessageHandlerOptions which simply logs a warning log message each time an error occurs while receiving messages.

We noticed that, from time to time, our application raises some OperationCanceledException while receiving messages from the Azure Service Bus topic.

In order to better understand what’s going on I followed this documentation to add additional application-level logs in order to better understand the issue.

First of all I noticed that, before throwing the OperationCanceledException, the Microsoft.Azure.ServiceBus.Receive operation works fine every time it is performed. This is a sample log that our application writes before the error appears (this is one of the additional logs that I added):

Operation “Microsoft.Azure.ServiceBus.Receive” is finished, Duration=00:00:59.9991789, Status=RanToCompletion, Id=“00-9cd1f0d0190fb34381fe58b8e06191ec-00ac27ca7a505a4b-00”, StartTime=“2021-01-17T13:39:51.4380719Z”

Suddenly the error occurs and we get the following error log (this is one of the additional logs that I added):

Operation “Microsoft.Azure.ServiceBus.Receive” with id “00-fe45ce4ba807fb45af16386641cd86ce-2786cf7eab4ba24c-00” started at “2021-01-17T13:39:56.4920237Z” raised an error: “The operation was canceled.Reference: 790ae7c96ae64a65861fbc662626fb07_G8, 01/17/2021 13:40:54”

Microsoft.Azure.ServiceBus.ServiceBusCommunicationException: The operation was canceled.Reference: 790ae7c96ae64a65861fbc662626fb07_G8, 01/17/2021 13:40:54 —> System.OperationCanceledException: The operation was canceled. at Microsoft.Azure.Amqp.AsyncResult.End[TAsyncResult](IAsyncResult result) at Microsoft.Azure.Amqp.ReceivingAmqpLink.EndReceiveMessages(IAsyncResult result, IEnumerable1& messages) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action1 endAction, Task1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime) --- End of inner exception stack trace --- at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass65_0.<<ReceiveAsync>b__0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func1 operation, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.MessageReceivePump.<MessagePumpTaskAsync>b__12_0()

The exact same error is also logged as a warning by the exceptionReceivedHandler delegate that we pass to the constructor of the MessageHandlerOptions class for our subscription client message handler:

Azure Notification Bus Message handler encountered an exception Exception context for troubleshooting:

  • Endpoint: ***** REDACTED *****
  • Entity Path: ***** REDACTED *****
  • Executing Action: Receive

Microsoft.Azure.ServiceBus.ServiceBusCommunicationException: The operation was canceled.Reference: 790ae7c96ae64a65861fbc662626fb07_G8, 01/17/2021 13:40:54 —> System.OperationCanceledException: The operation was canceled. at Microsoft.Azure.Amqp.AsyncResult.End[TAsyncResult](IAsyncResult result) at Microsoft.Azure.Amqp.ReceivingAmqpLink.EndReceiveMessages(IAsyncResult result, IEnumerable1& messages) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action1 endAction, Task1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime) --- End of inner exception stack trace --- at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass65_0.<<ReceiveAsync>b__0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func1 operation, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.MessageReceivePump.<MessagePumpTaskAsync>b__12_0()

At the this point we have another log, which notifies us that this time the Microsoft.Azure.ServiceBus.Receive has failed (this is one of the additional logs that I added):

Operation “Microsoft.Azure.ServiceBus.Receive” is finished, Duration=00:00:57.7255408, Status=Faulted, Id=“00-fe45ce4ba807fb45af16386641cd86ce-2786cf7eab4ba24c-00”, StartTime=“2021-01-17T13:39:56.4920237Z”

After a couple of failed Microsoft.Azure.ServiceBus.Receive operations (both of them creates the exact logs sequence that I have described above) things work properly again. The following Microsoft.Azure.ServiceBus.Receive operations complete without errors:

Operation “Microsoft.Azure.ServiceBus.Receive” is finished, Duration=00:01:00.1531469, Status=RanToCompletion, Id=“00-487f0fa5352c6545a7938da9a13c405a-4c77146b1efa4646-00”, StartTime=“2021-01-17T13:39:56.5046654Z”

We have noticed that this kind of errors are much more frequent when we deploy our application inside of kubernetes. For some other customers the same application is deployed as a web job inside an Azure App Service: in that case this kind of errors are much less frequent.

Usually in the .NET world OperationCanceledException means a timeout due to a canceled CancellationToken. This seems to be contradicted by the above log messages: as you can see the duration of the faulted Microsoft.Azure.ServiceBus.Receive operation (00:00:57.7255408) is shorter than the duration of the successful one (00:00:59.9991789 and 00:01:00.1531469). This seems really strange to me.

Can you please help us to understand what is the root cause of this errors ?

Kind regards.

Enrico.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (2 by maintainers)

Most upvoted comments

@sksk571 hi, we have solved the issue by adoting the new Azure.Messaging.ServiceBus SDK as a replacement for Microsoft.Azure.ServiceBus version 5.1.0.

By adopting the Azure.Messaging.ServiceBus package we get rid of all the errors.

Thank you for your feedback. Tagging and routing to the team best able to assist.