aws-iot-device-sdk-python-v2: Publish future raises AWS_ERROR_MQTT_CONNECTION_DESTROYED

Describe the bug

In our design I add a callback to the future returned by the publish method:

publish_future, pkt_id = self._mqtt_connection.publish(topic, pkt[1], mqtt.QoS.AT_LEAST_ONCE, retain=False)
publish_future.add_done_callback(lambda x: self._publish_future_callback(x, pkt, topic))   

This normally works fine but I have been testing the code for robustness using a repeating cycle in and out of connected modes and on one occasion I got this exception: AWS_ERROR_MQTT_CONNECTION_DESTROYED

After this the sdk still appears to respond correctly but never publishes any packets. It even indicates it is resuming a connection but I don’t think it really is.

I have added a workaround to recreate the connection after I get this exception but I’d like to understand why it occurs.

Expected Behavior

For this exception to not get thrown and if it does for other library methods to then fail instead of appearing to work.

Current Behavior

Occasionally throws AWS_ERROR_MQTT_CONNECTION_DESTROYED and then fails to publish afterwards although the library seems to respond like it is working:

exception calling callback for <Future at 0xffff74059160 state=finished raised AwsCrtError>
Traceback (most recent call last):
  File "/bba_app/aws_iot_core.py", line 154, in _publish_future_callback
    result = publish_future.result(timeout=0.0)
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 433, in result
    return self.__get_result()
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
awscrt.exceptions.AwsCrtError: AWS_ERROR_MQTT_CONNECTION_DESTROYED: Connection has started destroying process, all uncompleted requests will fail.

Reproduction Steps

  1. Create connection
  2. Start connection
  3. Publish every 10 seconds for 5 minutes
  4. Shutdown connection
  5. Idle 2 minutes
  6. Repeat

Possible Solution

Are the AWSCRT objects persistent even with create context calls?

Additional Information/Context

Can someone explain what this message means and what can cause it? If there is an error in how I am using the sdk then perhaps that will make it clear.

SDK version used

1.11.3

Environment details (OS name and version, etc.)

Debian, Python 3.8

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Hi @TwistedTwigleg, Thanks so much for the detailed response that’s very helpful. I think I have arrived at your suggestions independently and my new build is working well. For the crash I had to get the destroyed message a lot so 100+ times over a few days before it occured. I added a catch for the destroyed message and rescue the packet like this: image Although I no longer see this message because I now no longer recreate the connection when waking from sleep. I agree it just seems unnesscessary.

Thanks @edcloudcycle! I will try and see if I can reproduce this issue on my end.

In the meantime, I would highly recommend persisting the MqttConnection until all the in-flight/pending messages are processed before reassigning the MqttConnection to avoid this issue. 👍

Hi @TwistedTwigleg,

  • The python application stops all its threads and powers down associated hardware it then calls the OS sleep and wakes up on an RTC alarm. When it wakes all threads are restarted and retain some state from before as required.
  • Reassigning, see code below, the self._mqtt_connection object persists during sleep.
  • I do disconnect before sleeping. just calling disconnect()

This function was being called when waking from sleep and caused the destroyed message when a packet was in flight. Posted as an image as the code formatting seems to mangle python. image

Hi, The message occurs during the creation of a connection. I am shutting down using: try: self._mqtt_connection.disconnect() except: pass Perhaps I should add some debug here just to check if it does throw. It occurs to me that a good use of this exception would be to put the unsent packet back on the queue of packets for use on reconnection.

It looks like the not sending packets symptom might be related to my code falling out of its correct state machine path and not recovering after the exception. I have tightened that up and will start testing again now. Thanks very much for your response. Ed