azure-sdk-for-python: Service Bus deadlocks in Azure function
- Package Name: azure-servicebus
- Package Version: 7.1.0
- Operating System: macOS 11.2.3
- Python Version: 3.8.7
Describe the bug When posting to a service bus queue running in an Azure function, a deadlock frequently occurs, particularly when using the debugger.
Here are the call stacks:
select (selectors.py:558)
_run_once (base_events.py:1823)
run_forever (base_events.py:570)
run_until_complete (base_events.py:603)
run (runners.py:44)
main (main.py:47)
<module> (worker.py:86)
_run_code (runpy.py:87)
_run_module_code (runpy.py:97)
run_path (runpy.py:265)
_run_code (runpy.py:87)
_run_module_as_main (runpy.py:194)
_wait_once (_common.py:106)
wait (_common.py:140)
_next (_channel.py:694)
__next__ (_channel.py:416)
__poll_grpc (dispatcher.py:549)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
_wait_once (_common.py:106)
wait (_common.py:140)
_next (_channel.py:694)
__next__ (_channel.py:416)
__poll_grpc (dispatcher.py:549)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
__enter__ (threading.py:247)
get (queue.py:164)
gen (dispatcher.py:541)
consume_request_iterator (_channel.py:195)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
__enter__ (threading.py:247)
put (queue.py:132)
put_nowait (queue.py:190)
on_logging (dispatcher.py:198)
emit (dispatcher.py:572)
handle (__init__.py:950)
callHandlers (__init__.py:1657)
handle (__init__.py:1595)
_log (__init__.py:1585)
debug (__init__.py:1430)
notify (threading.py:354)
put (queue.py:151)
put_nowait (queue.py:190)
on_logging (dispatcher.py:198)
emit (dispatcher.py:572)
handle (__init__.py:950)
callHandlers (__init__.py:1657)
handle (__init__.py:1595)
_log (__init__.py:1585)
debug (__init__.py:1430)
_add (/Users/dalemy/Microsoft/build_metrics/.env/lib/python3.8/site-packages/azure/servicebus/_common/message.py:586)
add_message (/Users/dalemy/Microsoft/build_metrics/.env/lib/python3.8/site-packages/azure/servicebus/_common/message.py:575)
main (/Users/dalemy/Microsoft/build_metrics/LogsUploaded/__init__.py:269)
__run_sync_func (dispatcher.py:511)
run (thread.py:57)
_worker (thread.py:80)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
acquire (__init__.py:899)
handle (__init__.py:948)
callHandlers (__init__.py:1657)
handle (__init__.py:1595)
_log (__init__.py:1585)
debug (__init__.py:1430)
_keep_alive (/Users/dalemy/Microsoft/build_metrics/.env/lib/python3.8/site-packages/uamqp/client.py:165)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
At a glance, it looks like the thread pool executor and thread 11 are locking in the same place in the queue
module. However, it seems that they should be separate objects with separate locks, so it’s not entirely clear what is happening here. The debugger is dead at this point, so I can’t get any more info about which objects are which.
To Reproduce Steps to reproduce the behavior:
- Run code like the following:
messages = [some long list of messages] #
servicebus_client = ServiceBusClient.from_connection_string(
conn_str="[Insert connection string]", logging_enable=False
)
with servicebus_client:
sender = servicebus_client.get_queue_sender(queue_name="[queue name]")
with sender:
current_batch = ServiceBusMessageBatch(262144)
for index, message in enumerate(messages):
try:
current_batch.add_message(message)
except MessageSizeExceededError:
sender.send_messages(current_batch)
current_batch = ServiceBusMessageBatch(262144)
current_batch.add_message(message)
sender.send_messages(current_batch)
This will be enough to fail on it’s own usually. But inserting breakpoints and then continuing when they’ve been hit seems to reduce the time it takes to hit it.
Expected behavior The program will continue iterating until all messages are exhausted.
Additional context I’m a Microsoft employee. I can provide the repo if you email me internally. My alias is dalemy.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (8 by maintainers)
I’ve been testing for the last few days and have yet to see this issue. I think we can say that it is resolved.
I did not however see that particular exception at any point.
On Tue, 20 Apr 2021 at 19:02, Adam Ling (MSFT) @.***> wrote:
You may need significantly more events. I don’t usually see it until at least half way through. I have up to 30,000 messages, each one is at least a few hundred bytes, going up to ~100KB.
(But yes, I can try in the morning)
The repro steps are fairly simple. Basically set up an Azure function (I think this is running on Python 3.9), and have a trigger (I’ve reproduced this with both a HTTP trigger and a service bus trigger (different queue and even different resource group)) that calls this code.
Running it outside of an Azure function appears to work just fine. I see two threads. One created by the library, and the main thread.