pika: Stack overflow using Asynchronous Consumer Example

I’m testing the pika/examples/asynchronous_consumer_example.py and deliberately have my RabbitMQ server offline to see how it handles the connection-errors.

I updated the self._connection.ioloop.call_later(5, self.reconnect) in the on_connection_open_error method from 5 seconds to 0 seconds, because in a previous version of this async consumer I found an issue that after many attempts, I got a recursion error (https://github.com/pika/pika/issues/399) and the following PR (https://github.com/pika/pika/pull/710) was only for producers, not consumers.

Creating the consumer and calling run(), directly fails (as predicted) that it cannot connect to the broker (which on purpose does not exist), but after a very short while (approx 1 sec) it makes an ugly crash with the following:

Fatal Python error: Cannot recover from stack overflow.

Current thread 0x00007fa412e1a700 (most recent call first):
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/base_connection.py", line 113 in __repr__
  File "/usr/local/lib/python3.6/logging/__init__.py", line 338 in getMessage
  File "/usr/local/lib/python3.6/logging/__init__.py", line 575 in format
  File "/usr/local/lib/python3.6/logging/__init__.py", line 838 in format
  File "/usr/local/lib/python3.6/logging/__init__.py", line 992 in emit
  File "/usr/local/lib/python3.6/logging/__init__.py", line 863 in handle
  File "/usr/local/lib/python3.6/logging/__init__.py", line 1514 in callHandlers
  File "/usr/local/lib/python3.6/logging/__init__.py", line 1452 in handle
  File "/usr/local/lib/python3.6/logging/__init__.py", line 1442 in _log
  File "/usr/local/lib/python3.6/logging/__init__.py", line 1306 in info
  File "/usr/src/app/consumer/async_consumer.py", line 106 in on_connection_error
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/callback.py", line 236 in process
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/callback.py", line 92 in wrapper
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/callback.py", line 60 in wrapper
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/connection.py", line 2172 in _on_stream_terminated
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/base_connection.py", line 375 in _handle_connection_workflow_failure
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/base_connection.py", line 352 in _on_connection_workflow_done
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/base_connection.py", line 276 in _unshim_connection_workflow_callback
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/utils/connection_workflow.py", line 753 in _report_completion_and_cleanup
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/utils/connection_workflow.py", line 770 in _start_new_cycle_async
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/utils/connection_workflow.py", line 824 in _on_getaddrinfo_async_done
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/utils/selector_ioloop_adapter.py", line 610 in _dispatch_result
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 457 in process_timeouts
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 769 in start
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 508 in start
  File "/usr/src/app/consumer/async_consumer.py", line 143 in reconnect
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 297 in process_timeouts
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 459 in process_timeouts
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 769 in start
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 508 in start
  File "/usr/src/app/consumer/async_consumer.py", line 143 in reconnect
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 297 in process_timeouts
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 459 in process_timeouts
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 769 in start
  File "/usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg/pika/adapters/select_connection.py", line 508 in start
  File "/usr/src/app/consumer/async_consumer.py", line 143 in reconnect

The last 5 lines are repeated a number of times and then finishes with

  ...
Aborted
> pip show pika
Name: pika
Version: 1.0.0b1
Summary: Pika Python AMQP Client Library
Home-page: https://pika.readthedocs.io
Author: None
Author-email: None
License: BSD
Location: /usr/local/lib/python3.6/site-packages/pika-1.0.0b1-py3.6.egg
Requires:
Required-by:
> python --version
Python 3.6.5

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 15 (11 by maintainers)

Most upvoted comments

@Jacobh2 please see PR #1173, which fixes the reconnect example.

In short, calling back into the consumer class via call_later is fraught with the potential for a stack overflow. There is no way around this, of course, because the interaction between the ioloop and callbacks will always result in this scenario if call_later is used.

reconnect suffered from this issue because failed reconnect attempts would then call back into the same ExampleConsumer instance over and over.

So, It all depends on how add_callback_threadsafe is used. 99% of the time this method will be used to acknowledge a message and nothing else, so when the callback is done, it will return and be done.

If you have a specific example you’d like me to review I would be happy to. I will review all of the example code to ensure it does not suffer from the issue reported here.

I honestly don’t remember the exact timeframe, but I remember it didn’t happen directly. Instead I found it after at least a couple of days.

The original problem happened on a IoT device running with faulty configured network, where I set it to never stop trying to reconnect. The amount of times the reconnect had to be called until it crashed I don’t know, so not completely sure of the timeframe, guessing it also have to do with what I set the delay to be.

When I saw the logs (as posted above), I narrow it down to the fact that I can very simply reproduce this with no server online at all and with zero as delay.