iotedge: Restarting a custom module has a chance to trigger a fatal failure inside edgeHub 1.3.0/1.4.0

Expected Behavior

edgeHub doesn’t spin out of control after restarting another module

Current Behavior

Restarting a custom module that is connected to edgeHub via MQTT has a chance to trigger some sort of failure inside edgeHub causing it to allocate all available system memory. This leads to oom-killer killing the edgeHub process.

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Deploy a custom module that connects to edgeHub via MQTT
gwa@gwa-prototyping-vm1:~$ iotedge list
NAME                  STATUS           DESCRIPTION      Config
edgeAgent             running          Up an hour       mcr.microsoft.com/azureiotedge-agent:1.4
edgeHub               running          Up 13 minutes    mcr.microsoft.com/azureiotedge-hub:1.4
gwa-iot-edge-adapter  running          Up 14 minutes    gwaiotedge.azurecr.io/gwa_azure_iot_edge_adapter:0.3.0-amd64
gwa-iotedge-broker    running          Up an hour       gwaiotedge.azurecr.io/gwa_iotedge_broker:0.2.0-amd64

The gwa-iot-edge-adapter connects to the edgeHub, retrieves it’s module twin and then subscribes to the module-twin updates and module method topics. After that it does basically nothing. The gwa-iotedge-broker is not important in this case as it is a mosquitto broker and it doesn’t interact with edgeHub in any way. 2. Start issuing the restart command for the module:

iotedge restart gwa-iot-edge-adapter
  1. Every other restart you can observe the edgeHub freaking out and eating all available memory: image

oom-killer getting involved: dmesg.log

Context (Environment)

Output of iotedge check

Click here
Configuration checks (aziot-identity-service)
---------------------------------------------
√ keyd configuration is well-formed - OK
√ certd configuration is well-formed - OK
√ tpmd configuration is well-formed - OK
√ identityd configuration is well-formed - OK
√ daemon configurations up-to-date with config.toml - OK
√ identityd config toml file specifies a valid hostname - OK
√ aziot-identity-service package is up-to-date - OK
√ host time is close to reference time - OK
√ preloaded certificates are valid - OK
√ keyd is running - OK
√ certd is running - OK
√ identityd is running - OK
√ read all preloaded certificates from the Certificates Service - OK
√ read all preloaded key pairs from the Keys Service - OK
√ check all EST server URLs utilize HTTPS - OK
√ ensure all preloaded certificates match preloaded private keys with the same ID - OK

Connectivity checks (aziot-identity-service)
--------------------------------------------
√ host can connect to and perform TLS handshake with iothub AMQP port - OK
√ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with iothub MQTT port - OK

Configuration checks
--------------------
√ aziot-edged configuration is well-formed - OK
√ configuration up-to-date with config.toml - OK
√ container engine is installed and functional - OK
√ configuration has correct URIs for daemon mgmt endpoint - OK
√ aziot-edge package is up-to-date - OK
√ container time is close to host time - OK
‼ DNS server - Warning
    Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
    Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
    You can ignore this warning if you are setting DNS server per module in the Edge deployment.
√ production readiness: logs policy - OK
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
    The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.
    Data might be lost if the module is deleted or updated.
    Please see https://aka.ms/iotedge-storage-host for best practices.
‼ production readiness: Edge Hub's storage directory is persisted on the host filesystem - Warning
    The edgeHub module is not configured to persist its /tmp/edgeHub directory on the host filesystem.
    Data might be lost if the module is deleted or updated.
    Please see https://aka.ms/iotedge-storage-host for best practices.
√ Agent image is valid and can be pulled from upstream - OK
√ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK

Connectivity checks
-------------------
√ container on the default network can connect to upstream AMQP port - OK
√ container on the default network can connect to upstream HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to upstream AMQP port - OK
√ container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - OK
32 check(s) succeeded.
3 check(s) raised warnings. Re-run with --verbose for more details.
2 check(s) were skipped due to errors from other checks. Re-run with --verbose for more details.

Device Information

  • Host OS [e.g. Ubuntu 18.04, Windows Server IoT 2019]: Ubuntu 18.04.6 LTS
  • Architecture [e.g. amd64, arm32, arm64]: adm64
  • Container OS [e.g. Linux containers, Windows containers]: Standard linux edgeHub and edgeAgent containers, the custom containers are based on debian:buster-slim

Runtime Versions

  • aziot-edged [run iotedge version]: iotedge 1.3.0 & iotedge 1.4.0
  • Edge Agent [image tag (e.g. 1.0.0)]: 1.3 & 1.4
  • Edge Hub [image tag (e.g. 1.0.0)]: 1.3 & 1.4
  • Docker/Moby [run docker version]: 20.10.17+azure-1

Note: when using Windows containers on Windows, run docker -H npipe:////./pipe/iotedge_moby_engine version instead

Logs

aziot-edged logs: aziot-edged.log

edge-agent logs
<6> 2022-08-29 09:10:52.694 +00:00 [INF] - Updated reported properties                                                                                                                                                                               [0/9214]
<6> 2022-08-29 09:11:03.569 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:11:09.396 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:11:35.273 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:11:46.121 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:11:57.040 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:12:08.838 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:12:19.164 +00:00 [INF] - Module 'edgeHub' scheduled to restart after 40s (36s left).                                                                                                                                                       
<6> 2022-08-29 09:12:19.292 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:12:24.306 +00:00 [INF] - Module 'edgeHub' scheduled to restart after 40s (31s left).                                                                                                                                                       
<6> 2022-08-29 09:12:24.307 +00:00 [INF] - Module 'gwa-iot-edge-adapter' scheduled to restart after 10s (06s left).                                                                                                                                          
<6> 2022-08-29 09:12:24.431 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:12:29.440 +00:00 [INF] - Module 'edgeHub' scheduled to restart after 40s (26s left).                                                                                                                                                       
<6> 2022-08-29 09:12:29.440 +00:00 [INF] - Module 'gwa-iot-edge-adapter' scheduled to restart after 10s (01s left).                                                                                                                                          
<6> 2022-08-29 09:12:35.362 +00:00 [INF] - Module 'edgeHub' scheduled to restart after 40s (20s left).                                                                                                                                                       
<6> 2022-08-29 09:12:35.363 +00:00 [INF] - Plan execution started for deployment 223                                                                                                                                                                         
<6> 2022-08-29 09:12:35.363 +00:00 [INF] - Executing command: "Command Group: (\n  [Stop module gwa-iot-edge-adapter]\n  [Start module gwa-iot-edge-adapter]\n  [Saving gwa-iot-edge-adapter to store]\n)"                                                   
<6> 2022-08-29 09:12:35.364 +00:00 [INF] - Executing command: "Stop module gwa-iot-edge-adapter"                                                                                                                                                             
<6> 2022-08-29 09:12:35.366 +00:00 [INF] - Executing command: "Start module gwa-iot-edge-adapter"                                                                                                                                                            
<6> 2022-08-29 09:12:35.891 +00:00 [INF] - Executing command: "Saving gwa-iot-edge-adapter to store"                                                                                                                                                         
<6> 2022-08-29 09:12:35.896 +00:00 [INF] - Plan execution ended for deployment 223                                                                                                                                                                           
<6> 2022-08-29 09:12:40.934 +00:00 [INF] - Module 'edgeHub' scheduled to restart after 40s (14s left).                                                                                                                                                       
<6> 2022-08-29 09:12:41.207 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:12:46.216 +00:00 [INF] - Module 'edgeHub' scheduled to restart after 40s (09s left).                                                                                                                                                       
<6> 2022-08-29 09:12:46.216 +00:00 [INF] - Module 'gwa-iot-edge-adapter' scheduled to restart after 20s (15s left).                                                                                                                                          
<6> 2022-08-29 09:12:46.338 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:12:51.347 +00:00 [INF] - Module 'edgeHub' scheduled to restart after 40s (04s left).                                                                                                                                                       
<6> 2022-08-29 09:12:51.347 +00:00 [INF] - Module 'gwa-iot-edge-adapter' scheduled to restart after 20s (09s left).                                                                                                                                          
<6> 2022-08-29 09:12:56.355 +00:00 [INF] - Module 'gwa-iot-edge-adapter' scheduled to restart after 20s (04s left).                                                                                                                                          
<6> 2022-08-29 09:12:56.355 +00:00 [INF] - Plan execution started for deployment 223                                                                                                                                                                         
<6> 2022-08-29 09:12:56.355 +00:00 [INF] - Executing command: "Command Group: (\n  [Stop module edgeHub]\n  [Start module edgeHub]\n  [Saving edgeHub to store]\n)"                                                                                          
<6> 2022-08-29 09:12:56.355 +00:00 [INF] - Executing command: "Stop module edgeHub"                                                                                                                                                                          
<6> 2022-08-29 09:12:56.357 +00:00 [INF] - Executing command: "Start module edgeHub"                                                                                                                                                                         
<6> 2022-08-29 09:12:56.803 +00:00 [INF] - Executing command: "Saving edgeHub to store"                                                                                                                                                                      
<6> 2022-08-29 09:12:56.803 +00:00 [INF] - Plan execution ended for deployment 223                                                                                                                                                                           
<6> 2022-08-29 09:13:01.863 +00:00 [INF] - Plan execution started for deployment 223                                                                                                                                                                         
<6> 2022-08-29 09:13:01.864 +00:00 [INF] - Executing command: "Command Group: (\n  [Stop module gwa-iot-edge-adapter]\n  [Start module gwa-iot-edge-adapter]\n  [Saving gwa-iot-edge-adapter to store]\n)"                                                   
<6> 2022-08-29 09:13:01.864 +00:00 [INF] - Executing command: "Stop module gwa-iot-edge-adapter"                                                                                                                                                             
<6> 2022-08-29 09:13:01.865 +00:00 [INF] - Executing command: "Start module gwa-iot-edge-adapter"                                                                                                                                                            
<6> 2022-08-29 09:13:02.448 +00:00 [INF] - Executing command: "Saving gwa-iot-edge-adapter to store"                                                                                                                                                         
<6> 2022-08-29 09:13:02.448 +00:00 [INF] - Plan execution ended for deployment 223                                                                                                                                                                           
<6> 2022-08-29 09:13:02.652 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:13:07.790 +00:00 [INF] - Updated reported properties                                                                                                                                                                                       
<6> 2022-08-29 09:22:59.476 +00:00 [INF] - HealthRestartPlanner is clearing restart stats for module 'edgeHub' as it has been running healthy for 00:10:00.                                                                                                  
<6> 2022-08-29 09:22:59.476 +00:00 [INF] - Plan execution started for deployment 223                                                                                                                                                                         
<6> 2022-08-29 09:22:59.476 +00:00 [INF] - Executing command: "Saving edgeHub to store"                                                                                                                                                                      
<6> 2022-08-29 09:22:59.477 +00:00 [INF] - Plan execution ended for deployment 223                                                                                                                                                                           
<6> 2022-08-29 09:23:04.512 +00:00 [INF] - HealthRestartPlanner is clearing restart stats for module 'gwa-iot-edge-adapter' as it has been running healthy for 00:10:00.
<6> 2022-08-29 09:23:04.513 +00:00 [INF] - Plan execution started for deployment 223                
<6> 2022-08-29 09:23:04.513 +00:00 [INF] - Executing command: "Saving gwa-iot-edge-adapter to store"
<6> 2022-08-29 09:23:04.513 +00:00 [INF] - Plan execution ended for deployment 223
<6> 2022-08-29 09:23:04.661 +00:00 [INF] - Updated reported properties
<6> 2022-08-29 09:23:09.788 +00:00 [INF] - Updated reported properties

edge-hub logs
<7> 2022-08-29 09:12:01.012 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Storage.MessageStore] - Getting next batch for endpoint iothub starting from 0 with batch size 100.
<7> 2022-08-29 09:12:01.012 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Storage.MessageStore] - Obtained next batch for endpoint iothub with batch size 0. Next start offset = 0.
<7> 2022-08-29 09:12:02.524 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Twin.StoringTwinManager] - Syncing stored reported properties to cloud in gwa-prototyping-edge-device-1/$edgeHub
<7> 2022-08-29 09:12:02.525 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to unix:///var/run/iotedge/workload.sock to Decrypt
<7> 2022-08-29 09:12:02.525 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.525 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.526 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://workload.sock/modules/%24edgeHub/genid/637691910154316564/decrypt?api-version=2020-07-07
<7> 2022-08-29 09:12:02.545 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received OK
<7> 2022-08-29 09:12:02.545 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Received a valid Http response from unix:///var/run/iotedge/workload.sock for Decrypt
<6> 2022-08-29 09:12:02.545 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.Twin.StoringTwinManager] - Done syncing reported properties for gwa-prototyping-edge-device-1/$edgeHub
<7> 2022-08-29 09:12:02.545 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Twin.StoringTwinManager] - Syncing stored reported properties to cloud in gwa-prototyping-edge-device-1/gwa-iot-edge-adapter
<7> 2022-08-29 09:12:02.546 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to unix:///var/run/iotedge/workload.sock to Decrypt
<7> 2022-08-29 09:12:02.546 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.547 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.547 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://workload.sock/modules/%24edgeHub/genid/637691910154316564/decrypt?api-version=2020-07-07
<7> 2022-08-29 09:12:02.567 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received OK
<7> 2022-08-29 09:12:02.567 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Received a valid Http response from unix:///var/run/iotedge/workload.sock for Decrypt
<7> 2022-08-29 09:12:02.567 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Twin.StoringTwinManager] - Found stored reported properties for gwa-prototyping-edge-device-1/gwa-iot-edge-adapter to sync to cloud
<7> 2022-08-29 09:12:02.568 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to unix:///var/run/iotedge/workload.sock to Decrypt
<7> 2022-08-29 09:12:02.568 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.568 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.568 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://workload.sock/modules/%24edgeHub/genid/637691910154316564/decrypt?api-version=2020-07-07
<7> 2022-08-29 09:12:02.584 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received OK
<7> 2022-08-29 09:12:02.584 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Received a valid Http response from unix:///var/run/iotedge/workload.sock for Decrypt
<7> 2022-08-29 09:12:02.584 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Twin.StoringTwinManager] - Updating reported properties for gwa-prototyping-edge-device-1/gwa-iot-edge-adapter
<7> 2022-08-29 09:12:02.584 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionManager] - Obtained cloud connection for device gwa-prototyping-edge-device-1/gwa-iot-edge-adapter
<7> 2022-08-29 09:12:02.709 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.DeviceConnectivityManager] - IotHub call succeeded
<7> 2022-08-29 09:12:02.709 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient] - Operation UpdateReportedPropertiesAsync succeeded for gwa-prototyping-edge-device-1/gwa-iot-edge-adapter
<7> 2022-08-29 09:12:02.710 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy] - Updating reported properties for gwa-prototyping-edge-device-1/gwa-iot-edge-adapter
<6> 2022-08-29 09:12:02.710 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.Twin.StoringTwinManager] - Updated reported properties for gwa-prototyping-edge-device-1/gwa-iot-edge-adapter
<7> 2022-08-29 09:12:02.710 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Twin.StoringTwinManager] - Updated reported properties for gwa-prototyping-edge-device-1/gwa-iot-edge-adapter
<7> 2022-08-29 09:12:02.710 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to unix:///var/run/iotedge/workload.sock to Decrypt
<7> 2022-08-29 09:12:02.710 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.711 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.711 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://workload.sock/modules/%24edgeHub/genid/637691910154316564/decrypt?api-version=2020-07-07
<7> 2022-08-29 09:12:02.730 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received OK
<7> 2022-08-29 09:12:02.731 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Received a valid Http response from unix:///var/run/iotedge/workload.sock for Decrypt
<7> 2022-08-29 09:12:02.731 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to unix:///var/run/iotedge/workload.sock to Encrypt
<7> 2022-08-29 09:12:02.731 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.731 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.731 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://workload.sock/modules/%24edgeHub/genid/637691910154316564/encrypt?api-version=2020-07-07
<7> 2022-08-29 09:12:02.752 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received OK
<7> 2022-08-29 09:12:02.752 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Received a valid Http response from unix:///var/run/iotedge/workload.sock for Encrypt
<7> 2022-08-29 09:12:02.753 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to unix:///var/run/iotedge/workload.sock to Decrypt
<7> 2022-08-29 09:12:02.753 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.753 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:02.753 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://workload.sock/modules/%24edgeHub/genid/637691910154316564/decrypt?api-version=2020-07-07
<7> 2022-08-29 09:12:02.772 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received OK
<7> 2022-08-29 09:12:02.772 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Received a valid Http response from unix:///var/run/iotedge/workload.sock for Decrypt
<7> 2022-08-29 09:12:11.046 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Storage.MessageStore] - Getting next batch for endpoint iothub starting from 0 with batch size 100.
<7> 2022-08-29 09:12:11.046 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Storage.MessageStore] - Obtained next batch for endpoint iothub with batch size 0. Next start offset = 0.


### This is where the oom-killer gets involved ###

2022-08-29 09:12:56  Starting Edge Hub
2022-08-29 09:12:56  Starting Edge Hub
2022-08-29 09:12:56  Changing ownership of storage folder: /tmp/edgeHub to 13623
2022-08-29 09:12:56  Changing ownership of backup folder: /tmp/edgeHub_backup to 13623
2022-08-29 09:12:56.868 +00:00 Edge Hub Main()
<7> 2022-08-29 09:12:56.996 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to unix:///var/run/iotedge/workload.sock to CreateServerCertificateAsync
<7> 2022-08-29 09:12:57.175 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:57.185 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:57.188 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://workload.sock/modules/%24edgeHub/genid/637691910154316564/certificate/server?api-version=2020-07-07
<7> 2022-08-29 09:12:57.301 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received Created
<7> 2022-08-29 09:12:57.358 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Received a valid Http response from unix:///var/run/iotedge/workload.sock for CreateServerCertificateAsync
<7> 2022-08-29 09:12:57.378 +00:00 [DBG] [EdgeHub] - Importing RSA private key
<7> 2022-08-29 09:12:57.401 +00:00 [DBG] [EdgeHub] - RSA private key has been imported and assigned to certificate
<7> 2022-08-29 09:12:57.403 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to unix:///var/run/iotedge/workload.sock to TrustBundleAsync
<7> 2022-08-29 09:12:57.410 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:57.410 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/workload.sock
<7> 2022-08-29 09:12:57.410 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://workload.sock/trust-bundle?api-version=2020-07-07
<7> 2022-08-29 09:12:57.413 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received OK
<7> 2022-08-29 09:12:57.414 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Received a valid Http response from unix:///var/run/iotedge/workload.sock for TrustBundleAsync
<6> 2022-08-29 09:12:57.421 +00:00 [INF] [EdgeHub] - Installing certificates [CN=aziot-edge CA gwa-prototyping-edge-device-1:08/11/2023 07:44:44] to Root
<6> 2022-08-29 09:12:57.450 +00:00 [INF] [EdgeHub] - Installing certificates [CN=aziot-edge CA gwa-prototyping-edge-device-1:08/11/2023 07:44:44] to Root
<6> 2022-08-29 09:12:57.456 +00:00 [INF] [EdgeHub] - Enabling SSL protocols: Tls12
<6> 2022-08-29 09:12:57.574 +00:00 [INF] [EdgeHub] - Experimental features configuration: {"Enabled":false,"DisableCloudSubscriptions":false,"DisableConnectivityCheck":false}
<6> 2022-08-29 09:12:58.000 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Service.Modules.RoutingModule] - Created persistent store at /tmp/edgeHub
<6> 2022-08-29 09:12:58.036 +00:00 [INF] [EdgeHub] - Initializing Edge Hub
<6> 2022-08-29 09:12:58.036 +00:00 [INF] [EdgeHub] - 
        █████╗ ███████╗██╗   ██╗██████╗ ███████╗
       ██╔══██╗╚══███╔╝██║   ██║██╔══██╗██╔════╝
       ███████║  ███╔╝ ██║   ██║██████╔╝█████╗
       ██╔══██║ ███╔╝  ██║   ██║██╔══██╗██╔══╝
       ██║  ██║███████╗╚██████╔╝██║  ██║███████╗
       ╚═╝  ╚═╝╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚══════╝

 ██╗ ██████╗ ████████╗    ███████╗██████╗  ██████╗ ███████╗
 ██║██╔═══██╗╚══██╔══╝    ██╔════╝██╔══██╗██╔════╝ ██╔════╝
 ██║██║   ██║   ██║       █████╗  ██║  ██║██║  ███╗█████╗
 ██║██║   ██║   ██║       ██╔══╝  ██║  ██║██║   ██║██╔══╝
 ██║╚██████╔╝   ██║       ███████╗██████╔╝╚██████╔╝███████╗
 ╚═╝ ╚═════╝    ╚═╝       ╚══════╝╚═════╝  ╚═════╝ ╚══════╝

<6> 2022-08-29 09:12:58.037 +00:00 [INF] [EdgeHub] - Version - 1.4.0.59620487 (157650c3b1c5fb58b17f5975e6f1417a5a5f4f89)
<6> 2022-08-29 09:12:58.038 +00:00 [INF] [EdgeHub] - OptimizeForPerformance=True
<6> 2022-08-29 09:12:58.038 +00:00 [INF] [EdgeHub] - MessageAckTimeoutSecs=30
<6> 2022-08-29 09:12:58.038 +00:00 [INF] [EdgeHub] - Loaded server certificate with expiration date of "2022-09-28T09:12:57.0000000+00:00"
<6> 2022-08-29 09:12:58.044 +00:00 [INF] [EdgeHub] - Using Asp Net server for metrics

Additional Information

We can reproduce this behaviour on both 1.3.0 and 1.4.0, but we didn’t notice it on 1.2.0. Whether it triggers or not seems completely random, suggesting it requires some exact timing to hit the issue inside edgeHub. It’s possible that our custom module is doing something incorrect but even then I don’t think it should be possible to crash the edgeHub through the MQTT API and besides when it doesn’t crash it works perfectly. I had to attach some of the logs as files, because github was complaining about the issue being too long.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 35 (18 by maintainers)

Commits related to this issue

Most upvoted comments

FYI - Edge Hub 1.4.2 is out now with updated dotnetty that should help with this issue: https://mcr.microsoft.com/en-us/product/azureiotedge-hub/tags

It’ll be in 1.4.2 and it looks like you might be able to see that @damonbarry and team are working hard (https://github.com/Azure/iotedge/pull/6689 https://github.com/Azure/iotedge/pull/6700) to push this release out as soon as possible.

Update: still forecasting first week of Oct.

Hey, sorry for the delay, we have a ticket for this open in azure portal support as well and I’ve involved our Principal Engineer there to answer these questions regarding customer impact. It’s Support request ID 2209010050000517 I hope you will be able to access the response there.

We’re looking into releasing a new image version. I’ll provide an update this afternoon

After some further digging I think I’ve found the cause https://github.com/Azure/DotNetty/pull/583 I’ve created an edgeHub image with Netty libraries manually patched to 0.7.5 and we will be testing it over the weekend, but it looks promising.