azure-sdk-for-python: Service Bus deadlocks in Azure function

  • Package Name: azure-servicebus
  • Package Version: 7.1.0
  • Operating System: macOS 11.2.3
  • Python Version: 3.8.7

Describe the bug When posting to a service bus queue running in an Azure function, a deadlock frequently occurs, particularly when using the debugger.

Here are the call stacks:

MainThread
select (selectors.py:558)
_run_once (base_events.py:1823)
run_forever (base_events.py:570)
run_until_complete (base_events.py:603)
run (runners.py:44)
main (main.py:47)
<module> (worker.py:86)
_run_code (runpy.py:87)
_run_module_code (runpy.py:97)
run_path (runpy.py:265)
_run_code (runpy.py:87)
_run_module_as_main (runpy.py:194)
grpc-thread
_wait_once (_common.py:106)
wait (_common.py:140)
_next (_channel.py:694)
__next__ (_channel.py:416)
__poll_grpc (dispatcher.py:549)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
Thread-10
_wait_once (_common.py:106)
wait (_common.py:140)
_next (_channel.py:694)
__next__ (_channel.py:416)
__poll_grpc (dispatcher.py:549)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
Thread-11
__enter__ (threading.py:247)
get (queue.py:164)
gen (dispatcher.py:541)
consume_request_iterator (_channel.py:195)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
ThreadPoolExecutor-0_0
__enter__ (threading.py:247)
put (queue.py:132)
put_nowait (queue.py:190)
on_logging (dispatcher.py:198)
emit (dispatcher.py:572)
handle (__init__.py:950)
callHandlers (__init__.py:1657)
handle (__init__.py:1595)
_log (__init__.py:1585)
debug (__init__.py:1430)
notify (threading.py:354)
put (queue.py:151)
put_nowait (queue.py:190)
on_logging (dispatcher.py:198)
emit (dispatcher.py:572)
handle (__init__.py:950)
callHandlers (__init__.py:1657)
handle (__init__.py:1595)
_log (__init__.py:1585)
debug (__init__.py:1430)
_add (/Users/dalemy/Microsoft/build_metrics/.env/lib/python3.8/site-packages/azure/servicebus/_common/message.py:586)
add_message (/Users/dalemy/Microsoft/build_metrics/.env/lib/python3.8/site-packages/azure/servicebus/_common/message.py:575)
main (/Users/dalemy/Microsoft/build_metrics/LogsUploaded/__init__.py:269)
__run_sync_func (dispatcher.py:511)
run (thread.py:57)
_worker (thread.py:80)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)
Thread-12
acquire (__init__.py:899)
handle (__init__.py:948)
callHandlers (__init__.py:1657)
handle (__init__.py:1595)
_log (__init__.py:1585)
debug (__init__.py:1430)
_keep_alive (/Users/dalemy/Microsoft/build_metrics/.env/lib/python3.8/site-packages/uamqp/client.py:165)
run (threading.py:870)
_bootstrap_inner (threading.py:932)
_bootstrap (threading.py:890)

At a glance, it looks like the thread pool executor and thread 11 are locking in the same place in the queue module. However, it seems that they should be separate objects with separate locks, so it’s not entirely clear what is happening here. The debugger is dead at this point, so I can’t get any more info about which objects are which.

To Reproduce Steps to reproduce the behavior:

  1. Run code like the following:
    messages = [some long list of messages] # 
    servicebus_client = ServiceBusClient.from_connection_string(
        conn_str="[Insert connection string]", logging_enable=False
    )

    with servicebus_client:
        sender = servicebus_client.get_queue_sender(queue_name="[queue name]")

        with sender:

            current_batch = ServiceBusMessageBatch(262144)
            for index, message in enumerate(messages):
                try:
                    current_batch.add_message(message)
                except MessageSizeExceededError:
                    sender.send_messages(current_batch)
                    current_batch = ServiceBusMessageBatch(262144)
                    current_batch.add_message(message)

            sender.send_messages(current_batch)

This will be enough to fail on it’s own usually. But inserting breakpoints and then continuing when they’ve been hit seems to reduce the time it takes to hit it.

Expected behavior The program will continue iterating until all messages are exhausted.

Additional context I’m a Microsoft employee. I can provide the repo if you email me internally. My alias is dalemy.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I’ve been testing for the last few days and have yet to see this issue. I think we can say that it is resolved.

I did not however see that particular exception at any point.

On Tue, 20 Apr 2021 at 19:02, Adam Ling (MSFT) @.***> wrote:

@dalemyers https://github.com/dalemyers how’s your investigation going? also I’m wondering if the size of the message is 100-200MB, then the SDK would throwMessageSizeExceededError.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Azure/azure-sdk-for-python/issues/17827#issuecomment-823487589, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKMZCXE6GNVWJH3LBWOOSTTJW6TDANCNFSM42PBHJHA .

You may need significantly more events. I don’t usually see it until at least half way through. I have up to 30,000 messages, each one is at least a few hundred bytes, going up to ~100KB.

(But yes, I can try in the morning)

The repro steps are fairly simple. Basically set up an Azure function (I think this is running on Python 3.9), and have a trigger (I’ve reproduced this with both a HTTP trigger and a service bus trigger (different queue and even different resource group)) that calls this code.

Running it outside of an Azure function appears to work just fine. I see two threads. One created by the library, and the main thread.