azure-sdk-for-net: [BUG] Service Bus reconnection not behaving as expected

Library name and version

Azure.Messaging.ServiceBus package 7.8.1

Describe the bug

We have experienced similar issues to this issue were it presents itself intermittently in a test suite.

Scenario is as follows: We have a resource page that lists the queues belonging to that context. We add messages to the relevant queue and subsequently check that the queue has the correct amount of messages and then we do a peek. This is where we get the following error.

Bad (Error getting queue status: The link 'G26:RR:304280244:637927926679470000:nfieldpurpleserbus:Queue:aaa-web-driver-test-sb-queue$management:56:sender' is force detached. Code: ServerError. Details: AmqpControlProtocolClient.Fault. TrackingId:10288625-1353-4faa-b78a-54c70e68cec6_B8, SystemTracker:nfieldpurpleserbus:Queue:aaa-web-driver-test-sb-queue, Timestamp:2022-07-07T12:34:35 Reference:276649e8-fb30-4238-a6d6-8def97115a6b, TrackingId:24da0460-4a03-47ca-a90d-fe87584c7a31_G26, SystemTracker:NoSystemTracker, Timestamp:2022-07-07T12:34:35 (GeneralError)

Expected behavior

After server failure, client library should reconnect

Actual behavior

After server failure, client library does not reconnect

Reproduction Steps

The problem seems intermittent and caused by something server side.

In our codebase we lazily create the ServiceBusClient and we cache the receiver in order to prevent exhausting the amount of sockets used. We tried to reproduce it in a small test project that represents the code flow, but we haven’t managed to reproduce it outside of the particular test suite.

Program.txt

Environment

Framework net462 Visual Studio 17.2.5

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (9 by maintainers)

Most upvoted comments

Even though those are two different issues (queue moved for load balancing, queue deleted and re-created), both of them have the same root cause. Fixing that bug will address both issues. Having said that, I must say both of them are not normal scenarios.

  1. Very rarely do we move queues or topics for load balancing. To give some context, we have done that in only one cluster recently in the last 2 years.
  2. Deleting and recreating the same queue and still using the cached receiver is not normal, in my opinion. In this case, service is closing send and receive links. So reusing senders and receivers to perform sends and receives will still work fine even after deleting and re-creating. But Peek is a special kind of operation which is impacted by this bug. Also ScheduleMessageAsync method will have the same problem.

We will be fixing this issue in the next service deployment.