azure-sdk-for-js: [Service Bus]: Unable to create the amqp sender *senderID* due to operation timeout.
- Package Name: @azure/service-bus
- Package Version: 1.1.2
- Operating system: node:10.15.3-jessie (docker image)
- nodejs
- version: 10.15.3
- Is the bug related to documentation in
- README.md
- source code documentation
- SDK API docs on https://docs.microsoft.com
Describe the bug We have several services which communicate with each other through Service Bus Topic. But after a while we caught the following error: Unable to create the amqp sender senderID due to operation timeout..
And right after this error we caught next uncaughtException: Cannot read property ‘_process’ of undefined. Seems these errors are connected:
Logs:
2020-01-04T04:05:26.159862808Z 1/4/2020, 04:05:26 - [31merror[39m: uncaughtException: Cannot read property '_process' of undefined date=Sat Jan 04 2020 04:05:26 GMT+0000 (Coordinated Universal Time), pid=17, uid=0, gid=0, cwd=/usr/src/server, execPath=/usr/local/bin/node, version=v10.15.3, argv=[/usr/local/bin/node, /usr/src/server/app], rss=72617984, heapTotal=48832512, heapUsed=40250808, external=18664439, loadavg=[12.9677734375, 13.07177734375, 12.94970703125], uptime=1848134, trace=[column=39, file=/usr/src/server/node_modules/rhea-promise/node_modules/rhea/lib/connection.js, function=Connection._process, line=708, method=_process, native=false, column=11, file=internal/process/next_tick.js, function=process._tickCallback, line=61, method=_tickCallback, native=false], stack=[TypeError: Cannot read property '_process' of undefined, at Connection._process (/usr/src/server/node_modules/rhea-promise/node_modules/rhea/lib/connection.js:708:39), at process._tickCallback (internal/process/next_tick.js:61:11)] 2020-01-04T04:05:26.162240653Z 1/4/2020, 04:05:26 - [31merror[39m: [uncaughtException] There was an uncaught exception: Cannot read property '_process' of undefined, TypeError: Cannot read property '_process' of undefined 2020-01-04T04:05:26.162256553Z at Connection._process (/usr/src/server/node_modules/rhea-promise/node_modules/rhea/lib/connection.js:708:39) 2020-01-04T04:05:26.162262053Z at process._tickCallback (internal/process/next_tick.js:61:11)
And this behavior happens on each service
In order to fix this issue, we have to restart each service.
To Reproduce Steps to reproduce the behavior:
- We did not find a way how to reproduce it quickly, just to start-up services and wait
Expected behavior
We set up Service Bus Topic receivers in the following way:
And we expect that it will work without any errors as it mentions
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 5
- Comments: 37 (12 by maintainers)
Thanks for sharing @MolochkoVitaly
Based on your logs, it does look like the AMQP connection in rhea (which we take a dependency on) lands in a bad state after encountering the error ERR_STREAM_WRITE_AFTER_END. None of the recovery attempts after this point is succeeding.
Link recovery includes creating a new session object (which represents an AMQP session) and then a new link object (which represents an AMQP link: sender or receiver based on what you are working with) in rhea, but on the same connection object (which represents an AMQP connection). The rhea library then revives the connection if it was broken. But, it looks like there are cases where we should ditch the connection object and create a new one.
We are looking into alternatives on how best to handle this and we should have a solution out this week.
@MolochkoVitaly Thank you very much for providing these logs! I’ll look into these and let you know what I can find.
Hey @chradek we’ve recently faced this issue again and now we have more logs. Here you can find it: https://github.com/MolochkoVitaly/Azure-Service-Bus-Logs/blob/master/2020_01_31_RD00155D87AFC7_default_docker.log.txt
Same issue in our production environment
Hello all,
Thanks for your patience.
We just released an update to the
@azure/service-buspackage (version 1.1.6) where we have improved the way we recover from broken connections. It should help with the cases seen in this issue where we encounter the ERR_STREAM_WRITE_AFTER_END error which results in the connection entering a bad state and all operations timing out.Please give it a try and let us know if you have any issues.
Thanks for reporting!
Thanks for the logs @MolochkoVitaly
While we take a look at them, can you reach out to Azure support to see if there were any service side problems or upgrades happening during this time?
We throw the
OperationTimeoutError: Unable to create the amqp session due to operation timeouterror when we are trying to establish a session with the service and the service does not respond within 60 seconds. So, I have to wonder if there has been issues with the service during this time.There is a log file with errors Lib version 1.1.5 @chradek could you please take a look?
This bug is super annoying
@Pruxis and @KevinDenys Do you have logging enabled for service-bus (setting DEBUG=azure:*error would be useful to start)? It would be very useful to see what’s happening in the library leading up to the operation timeout (e.g. is there a temporary network issue).
After you receive this error do you need to restart your receiver before getting more messages?
@chradek @ramya-rao-a any updates on this? This is a really annoying error…