azure-sdk-for-net: Azure service bus .NET sdk raises OperationCanceledException from time to time
Hi all,
I’m working with a .NET core 3.1 worker service application. We are referencing the Microsoft.Azure.ServiceBus nuget package version 5.1.0. The application is deployed as a pod inside a kubernetes cluster. We use Azure Kubernetes Service (AKS) to manage our cluster.
This application is meant to listen to messages published to an Azure Service Bus topic by another application. When the application starts up a subscription is created on the existing Azure Service Bus topic and a SubscriptionClient
instance is created in order to listen to incoming messages.
Our subscription client is configured as follows:
- the receive mode is
ReceiveMode.ReceiveAndDelete
- the retry policy is
RetryPolicy.Default
- we register a message handler by using the
SubscriptionClient.RegisterMessageHandler
method - we set
AutoComplete = true
in theMessageHandlerOptions
- we set a default value of
10
for theMaxConcurrentCalls
property in theMessageHandlerOptions
- we provide an
exceptionReceivedHandler
in the constructor of classMessageHandlerOptions
which simply logs a warning log message each time an error occurs while receiving messages.
We noticed that, from time to time, our application raises some OperationCanceledException
while receiving messages from the Azure Service Bus topic.
In order to better understand what’s going on I followed this documentation to add additional application-level logs in order to better understand the issue.
First of all I noticed that, before throwing the OperationCanceledException
, the Microsoft.Azure.ServiceBus.Receive
operation works fine every time it is performed. This is a sample log that our application writes before the error appears (this is one of the additional logs that I added):
Operation “Microsoft.Azure.ServiceBus.Receive” is finished, Duration=00:00:59.9991789, Status=RanToCompletion, Id=“00-9cd1f0d0190fb34381fe58b8e06191ec-00ac27ca7a505a4b-00”, StartTime=“2021-01-17T13:39:51.4380719Z”
Suddenly the error occurs and we get the following error log (this is one of the additional logs that I added):
Operation “Microsoft.Azure.ServiceBus.Receive” with id “00-fe45ce4ba807fb45af16386641cd86ce-2786cf7eab4ba24c-00” started at “2021-01-17T13:39:56.4920237Z” raised an error: “The operation was canceled.Reference: 790ae7c96ae64a65861fbc662626fb07_G8, 01/17/2021 13:40:54”
Microsoft.Azure.ServiceBus.ServiceBusCommunicationException: The operation was canceled.Reference: 790ae7c96ae64a65861fbc662626fb07_G8, 01/17/2021 13:40:54 —> System.OperationCanceledException: The operation was canceled. at Microsoft.Azure.Amqp.AsyncResult.End[TAsyncResult](IAsyncResult result) at Microsoft.Azure.Amqp.ReceivingAmqpLink.EndReceiveMessages(IAsyncResult result, IEnumerable
1& messages) at System.Threading.Tasks.TaskFactory
1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action
1 endAction, Task1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime) --- End of inner exception stack trace --- at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass65_0.<<ReceiveAsync>b__0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func
1 operation, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.MessageReceivePump.<MessagePumpTaskAsync>b__12_0()
The exact same error is also logged as a warning by the exceptionReceivedHandler
delegate that we pass to the constructor of the MessageHandlerOptions
class for our subscription client message handler:
Azure Notification Bus Message handler encountered an exception Exception context for troubleshooting:
- Endpoint: ***** REDACTED *****
- Entity Path: ***** REDACTED *****
- Executing Action: Receive
Microsoft.Azure.ServiceBus.ServiceBusCommunicationException: The operation was canceled.Reference: 790ae7c96ae64a65861fbc662626fb07_G8, 01/17/2021 13:40:54 —> System.OperationCanceledException: The operation was canceled. at Microsoft.Azure.Amqp.AsyncResult.End[TAsyncResult](IAsyncResult result) at Microsoft.Azure.Amqp.ReceivingAmqpLink.EndReceiveMessages(IAsyncResult result, IEnumerable
1& messages) at System.Threading.Tasks.TaskFactory
1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action
1 endAction, Task1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime) --- End of inner exception stack trace --- at Microsoft.Azure.ServiceBus.Core.MessageReceiver.OnReceiveAsync(Int32 maxMessageCount, TimeSpan serverWaitTime) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.<>c__DisplayClass65_0.<<ReceiveAsync>b__0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func
1 operation, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(Int32 maxMessageCount, TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.Core.MessageReceiver.ReceiveAsync(TimeSpan operationTimeout) at Microsoft.Azure.ServiceBus.MessageReceivePump.<MessagePumpTaskAsync>b__12_0()
At the this point we have another log, which notifies us that this time the Microsoft.Azure.ServiceBus.Receive
has failed (this is one of the additional logs that I added):
Operation “Microsoft.Azure.ServiceBus.Receive” is finished, Duration=00:00:57.7255408, Status=Faulted, Id=“00-fe45ce4ba807fb45af16386641cd86ce-2786cf7eab4ba24c-00”, StartTime=“2021-01-17T13:39:56.4920237Z”
After a couple of failed Microsoft.Azure.ServiceBus.Receive
operations (both of them creates the exact logs sequence that I have described above) things work properly again. The following Microsoft.Azure.ServiceBus.Receive
operations complete without errors:
Operation “Microsoft.Azure.ServiceBus.Receive” is finished, Duration=00:01:00.1531469, Status=RanToCompletion, Id=“00-487f0fa5352c6545a7938da9a13c405a-4c77146b1efa4646-00”, StartTime=“2021-01-17T13:39:56.5046654Z”
We have noticed that this kind of errors are much more frequent when we deploy our application inside of kubernetes. For some other customers the same application is deployed as a web job inside an Azure App Service: in that case this kind of errors are much less frequent.
Usually in the .NET world OperationCanceledException
means a timeout due to a canceled CancellationToken
. This seems to be contradicted by the above log messages: as you can see the duration of the faulted Microsoft.Azure.ServiceBus.Receive
operation (00:00:57.7255408
) is shorter than the duration of the successful one (00:00:59.9991789
and 00:01:00.1531469
). This seems really strange to me.
Can you please help us to understand what is the root cause of this errors ?
Kind regards.
Enrico.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (2 by maintainers)
@sksk571 hi, we have solved the issue by adoting the new Azure.Messaging.ServiceBus SDK as a replacement for Microsoft.Azure.ServiceBus version 5.1.0.
By adopting the Azure.Messaging.ServiceBus package we get rid of all the errors.
Thank you for your feedback. Tagging and routing to the team best able to assist.