pika: Stack overflow using Asynchronous Consumer Example
I’m testing the pika/examples/asynchronous_consumer_example.py and deliberately have my RabbitMQ server offline to see how it handles the connection-errors.
I updated the self._connection.ioloop.call_later(5, self.reconnect) in the on_connection_open_error method from 5 seconds to 0 seconds, because in a previous version of this async consumer I found an issue that after many attempts, I got a recursion error (https://github.com/pika/pika/issues/399) and the following PR (https://github.com/pika/pika/pull/710) was only for producers, not consumers.
Creating the consumer and calling run(), directly fails (as predicted) that it cannot connect to the broker (which on purpose does not exist), but after a very short while (approx 1 sec) it makes an ugly crash with the following:
Fatal Python error: Cannot recover from stack overflow.
Current thread 0x00007fa412e1a700 (most recent call first):
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/base_connection.py", line 113 in __repr__
File "/usr/local/lib/python3.6/logging/__init__.py", line 338 in getMessage
File "/usr/local/lib/python3.6/logging/__init__.py", line 575 in format
File "/usr/local/lib/python3.6/logging/__init__.py", line 838 in format
File "/usr/local/lib/python3.6/logging/__init__.py", line 992 in emit
File "/usr/local/lib/python3.6/logging/__init__.py", line 863 in handle
File "/usr/local/lib/python3.6/logging/__init__.py", line 1514 in callHandlers
File "/usr/local/lib/python3.6/logging/__init__.py", line 1452 in handle
File "/usr/local/lib/python3.6/logging/__init__.py", line 1442 in _log
File "/usr/local/lib/python3.6/logging/__init__.py", line 1306 in info
File "/usr/src/app/consumer/async_consumer.py", line 106 in on_connection_error
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/callback.py", line 236 in process
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/callback.py", line 92 in wrapper
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/callback.py", line 60 in wrapper
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/connection.py", line 2172 in _on_stream_terminated
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/base_connection.py", line 375 in _handle_connection_workflow_failure
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/base_connection.py", line 352 in _on_connection_workflow_done
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/base_connection.py", line 276 in _unshim_connection_workflow_callback
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/utils/connection_workflow.py", line 753 in _report_completion_and_cleanup
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/utils/connection_workflow.py", line 770 in _start_new_cycle_async
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/utils/connection_workflow.py", line 824 in _on_getaddrinfo_async_done
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/utils/selector_ioloop_adapter.py", line 610 in _dispatch_result
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 457 in process_timeouts
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 769 in start
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 508 in start
File "/usr/src/app/consumer/async_consumer.py", line 143 in reconnect
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 297 in process_timeouts
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 459 in process_timeouts
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 769 in start
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 508 in start
File "/usr/src/app/consumer/async_consumer.py", line 143 in reconnect
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 297 in process_timeouts
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 459 in process_timeouts
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 769 in start
File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 508 in start
File "/usr/src/app/consumer/async_consumer.py", line 143 in reconnect
The last 5 lines are repeated a number of times and then finishes with
...
Aborted
> pip show pika
Name: pika
Version: 1.0.0b1
Summary: Pika Python AMQP Client Library
Home-page: https://pika.readthedocs.io
Author: None
Author-email: None
License: BSD
Location: /usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg
Requires:
Required-by:
> python --version
Python 3.6.5
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 15 (11 by maintainers)
@Jacobh2 please see PR #1173, which fixes the reconnect example.
In short, calling back into the consumer class via
call_lateris fraught with the potential for a stack overflow. There is no way around this, of course, because the interaction between the ioloop and callbacks will always result in this scenario ifcall_lateris used.reconnectsuffered from this issue because failed reconnect attempts would then call back into the sameExampleConsumerinstance over and over.So, It all depends on how
add_callback_threadsafeis used. 99% of the time this method will be used to acknowledge a message and nothing else, so when the callback is done, it will return and be done.If you have a specific example you’d like me to review I would be happy to. I will review all of the example code to ensure it does not suffer from the issue reported here.
I honestly don’t remember the exact timeframe, but I remember it didn’t happen directly. Instead I found it after at least a couple of days.
The original problem happened on a IoT device running with faulty configured network, where I set it to never stop trying to reconnect. The amount of times the reconnect had to be called until it crashed I don’t know, so not completely sure of the timeframe, guessing it also have to do with what I set the delay to be.
When I saw the logs (as posted above), I narrow it down to the fact that I can very simply reproduce this with no server online at all and with zero as delay.