aws-iot-device-sdk-python-v2: After Linux suspend->resume aws data are not send to cloud. Socket error after ~20min
Confirm by changing [ ] to [x] below to ensure that it’s a bug:
- I’ve searched for previous similar issues and didn’t find any solution
Known Issue
- [x I’m using ATS data type endpoint: the endpoint should look like
<prefix>-ats.iot.<region>.amazonaws.com
Describe the bug Hi guys. We are running aws app on imx7ulp from NXP with LTE modem. Once we will suspend Linux and resume again the LTE interface will disappear for a while. But it will appear again with connection to the internet. But aws will be not able to sent any messages after internet is up again. It looks like no callbacks like: on_connection_interrupted and on_connection_resumed are called.
With eth0 it is happing too but not as often as with LTE.
Based on observation it looks like AWS_IO_SOCKET_TIMEOUT is thrown after ~20min. could be this Socket timeout edited to lower value? Also based on erros it looks like there are some issues to create socket, but at the end there is some created and used.
Before suspend 18 messages were sent. And after resume 257 message was sent. (256-18)*5s(sleep between msgs)/60 = ~20min
Interface and internet is up in 50s after resume.
SDK version number 1.3.0
Platform/OS/Device Linux kernel 4.19 - imx7ulp Arm® Cortex®-A7
To Reproduce (observed behavior) `
provLoc = "/prov"
crtLoc = "/certs"
AWS_CONNECT_DELAY_SECS = 10
AWS_KEEP_ALIVE_SECS = 3*60*60 # 3hours
client_id = "abc"
endpoint = "....-ats.iot.us-east-2.amazonaws.com"
aws_topic = "/d/"+client_id
# setup AWS connection
io.init_logging(io.LogLevel.Warn, 'stderr')
event_loop_group = io.EventLoopGroup(1)
host_resolver = io.DefaultHostResolver(event_loop_group)
client_bootstrap = io.ClientBootstrap(event_loop_group, host_resolver)
aws_mqtt = mqtt_connection_builder.mtls_from_path(
endpoint=endpoint,
cert_filepath="%s/device.crt" % (provLoc),
pri_key_filepath="%s/device.key" % (provLoc),
client_bootstrap=client_bootstrap,
ca_filepath="%s/AmazonRootCA1.pem" % (crtLoc),
on_connection_interrupted=aws_on_connection_interrupted,
on_connection_resumed=aws_on_connection_resumed,
client_id=client_id,
clean_session=False,
keep_alive_secs=AWS_KEEP_ALIVE_SECS)
# connect loop
connected = False
while not connected:
try:
print("[aws] Connecting to %s as %s ..." % (endpoint, client_id))
connect_future = aws_mqtt.connect()
connect_future.result()
connected = True
except:
print("[aws] Failed connection: %s" % (sys.exc_info()[0]))
sleep(AWS_CONNECT_DELAY_SECS)
print("[aws] Connected!")
# subscribe to AWS topic
print("[aws] Subscribing to topic '%s'..." % (aws_topic))
subscribe_future, packet_id = aws_mqtt.subscribe(
topic=aws_topic,
qos=mqtt.QoS.AT_LEAST_ONCE,
callback=aws_on_message_received)
subscribe_result = subscribe_future.result()
print("[aws] Subscribed with %s" % (str(subscribe_result['qos'])))
i = 0
while True:
LAST_PUBLISH_RESULT, _ = aws_mqtt.publish(topic=aws_topic,
payload=str(i),
qos=mqtt.QoS.AT_LEAST_ONCE)
sleep(5)
if not LAST_PUBLISH_RESULT.done():
LAST_PUBLISH_RESULT.cancel()
print("Cancel aws send request. Timeout.")
else:
print("Done...")
i += 1
`
Expected behavior On laptop: When i disable interface and enable again it will send all messages which were not sent successfully.
Logs/output If applicable, add logs or error output. `
[aws] Connecting to …iot.us-east-2.amazonaws.com as abc … [WARN ] [2020-07-14T19:19:21Z] [b51fe460] [socket] - id=0xb4809ba8 fd=11: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer. [ERROR] [2020-07-14T19:19:21Z] [b51fe460] [socket] - id=0xb4809ba8 fd=11: connect failed with error code 101. [ERROR] [2020-07-14T19:19:21Z] [b51fe460] [channel-bootstrap] - id=0x1b22ec0: failed to create socket with error 1049 [WARN ] [2020-07-14T19:19:21Z] [b51fe460] [socket] - id=0xb4809ba8 fd=11: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer. [aws] Connected! [aws] Subscribing to topic ‘aws-data’… [aws] Subscribed with QoS.AT_LEAST_ONCE [aws] Received message from topic ‘aws-data’: b’0’ Done… [aws] Received message from topic ‘aws-data’: b’1’ Done… … [aws] Received message from topic ‘aws-data’: b’18’ s[ 1004.892006] PM: suspend entry (deep) [ 1004.903644] PM: Syncing filesystems … done. [ 1004.942160] Freezing user space processes … (elapsed 0.005 seconds) done. [ 1004.955471] OOM killer disabled. [ 1004.958729] Freezing remaining freezable tasks … (elapsed 0.002 seconds) done. [ 1004.968407] Suspending console(s) (use no_console_suspend to debug) [ 1005.140539] fsl-lpuart 402e0000.serial: ttyLP1: Unable to drain transmitter [ 1005.141117] PM: suspend devices took 0.170 seconds [ 1005.245416] PM: resume devices took 0.090 seconds [ 1005.262232] OOM killer enabled. [ 1005.265421] Restarting tasks … done. [ 1005.360987] PM: suspend exit [ERROR] [2020-07-14T19:19:48Z] [b473f460] [dns] - static: getaddrinfo failed with error_code -3 [ERROR] [2020-07-14T19:19:49Z] [b473f460] [dns] - static: getaddrinfo failed with error_code -3 Done… … [ERROR] [2020-07-14T19:19:51Z] [b473f460] [dns] - static: getaddrinfo failed with error_code -3 [ERROR] [2020-07-14T19:19:52Z] [b473f460] [dns] - static: getaddrinfo failed with error_code -3 [ERROR] [2020-07-14T19:19:53Z] [b473f460] [dns] - static: getaddrinfo failed with error_code -3 [ERROR] [2020-07-14T19:19:54Z] [b473f460] [dns] - static: getaddrinfo failed with error_code -3 Cancel aws send request. Timeout. Cancel aws send request. Timeout. Cancel aws send request. Timeout. … [ERROR] [2020-07-14T19:39:29Z] [b51fe460] [socket] - id=0xb4809ba8 fd=11: socket timed out. [aws] Connection interrupted. error: AwsCrtError(name=‘AWS_IO_SOCKET_TIMEOUT’, message=‘socket operation timed out.’, code=1048) [WARN ] [2020-07-14T19:39:30Z] [b51fe460] [socket] - id=0xb4809ba8 fd=11: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer. [ERROR] [2020-07-14T19:39:30Z] [b51fe460] [socket] - id=0xb4809ba8 fd=11: connect failed with error code 101. [ERROR] [2020-07-14T19:39:30Z] [b51fe460] [channel-bootstrap] - id=0x1b22ec0: failed to create socket with error 1049 [WARN ] [2020-07-14T19:39:30Z] [b51fe460] [socket] - id=0xb4809ba8 fd=11: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer. [aws] Connection resumed. return_code: ConnectReturnCode.ACCEPTED session_present: True Done… [aws] Received message from topic ‘aws-data’: b’257’ Done…
`
Additional context: On laptop with Kubuntu 20.04 i have same output during connect (aws sdk 1.3.1): WARN ] [2020-07-14T20:17:30Z] [00007fca11ace700] [socket] - id=0x7fca04004ca0 fd=6: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer. [ERROR] [2020-07-14T20:17:30Z] [00007fca11ace700] [socket] - id=0x7fca04004ca0 fd=6: connect failed with error code 101. [ERROR] [2020-07-14T20:17:30Z] [00007fca11ace700] [channel-bootstrap] - id=0x23e51b0: failed to create socket with error 1049 [WARN ] [2020-07-14T20:17:30Z] [00007fca11ace700] [socket] - id=0x7fca04004ca0 fd=6: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer. [aws] Connected!
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (6 by maintainers)
I’m commenting based on an issue I raised here for the AWS C++ IoT SDK.
With regard to @graebm’s comment:
A couple of points to make around data consumption.
In a nutshell we are currently using TCP keep-alive over a cellular connection and it has significantly reduced our data consumption, around 30 to 40% reduction in data consumption, and the same keep-alive.
Hi @rccarper , i was able to try your suggestion. Still same behavior. Based on datetimestamp socket error came after 22min: