iotedge: EdgeHub stops sending messages to upstream

Expected Behavior

The EdgeHub send messages without interoupt

Current Behavior

As you can see in the following printscreen, the edgeHub is stop sending telemetry data to the upstream from the two containers, one of them is StreamAnalyticsEdge, the oder is a C# application.

2022-01-13 14_47_03-iot-euw-pilatusaircraft-prod-iot - Microsoft Azure

This is happend on 12.01.2021 beteween16:45 and 17:00 European Time +1

Good to Know

  • I have restarted the edge host 6 Days ago, at 07.01.021
  • see the support bundle --since 2d
  • Stream analytics has a windowing job thats generates every 5min messages
  • The connection to IoT Hub has only open ports for https and AMQP

Context (Environment)

Device Information

  • Host OS: Ubuntu 20.04
  • Architecture amd64
  • Container OS: Linux containers

Runtime Versions

  • aziot-edged: 1.2.3
  • Edge Agent: mcr.microsoft.com/azureiotedge-agent:1.2.4
  • Edge Hub: mcr.microsoft.com/azureiotedge-hub:1.2.4
  • Docker/Moby: 20.10.8+azure

Logs

In the edgeHub_log.txt i see the Exception: <6> 2022-01-12 15:59:56.232 +00:00 [INF] - Retrying cloud proxy operation UpdateReportedPropertiesAsync for iot-host/$edgeHub. System.TimeoutException: The operation timed out. —> System.Threading.Tasks.TaskCanceledException: A task was canceled. at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.SendTwinPatchAsync(TwinCollection reportedProperties, CancellationToken cancellationToken) at Microsoft.Azure.Devices.Client.InternalClient.UpdateReportedPropertiesAsync(TwinCollection reportedProperties) — End of inner exception stack trace — at Microsoft.Azure.Devices.Client.InternalClient.UpdateReportedPropertiesAsync(TwinCollection reportedProperties) at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.<>c__DisplayClass30_0.<<InvokeFunc>b__0>d.MoveNext() in /home/vsts/work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 187 — End of stack trace from previous location where exception was thrown — at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.InvokeFunc[T](Func1 func, String operation, Boolean useForConnectivityCheck) in /home/vsts/work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 180 at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.InvokeFunc[T](Func1 func, String operation, Boolean useForConnectivityCheck) in /home/vsts/work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 180 at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy.UpdateReportedPropertiesAsync(IMessage reportedPropertiesMessage) in /home/vsts/work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudProxy.cs:line 226 at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy.UpdateReportedPropertiesAsync(IMessage reportedPropertiesMessage) in /home/vsts/work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudProxy.cs:line 226 at Microsoft.Azure.Devices.Edge.Hub.Core.Cloud.RetryingCloudProxy.<>c__DisplayClass24_0.<<ExecuteOperation>b__0>d.MoveNext() in /home/vsts/work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Core/cloud/RetryingCloudProxy.cs:line 65 — End of stack trace from previous location where exception was thrown — at Microsoft.Azure.Devices.Edge.Hub.Core.Cloud.RetryingCloudProxy.ExecuteOperation[T](Func`2 func, String operation) in /home/vsts/work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Core/cloud/RetryingCloudProxy.cs:line 76

bevor that, there are creazy authentication messages to see: <6> 2022-01-12 16:04:09.140 +00:00 [INF] - Error authenticating token for iot-host/asaTelemetryBeaconTHReduceDublicates because the token is expired or could not be parsed <6> 2022-01-12 16:04:09.140 +00:00 [INF] - Credentials for client iot-host/asaTelemetryBeaconTHReduceDublicates are not valid. <4> 2022-01-12 16:04:09.140 +00:00 [WRN] - Reauthenticating client iot-host/asaTelemetryBeaconTHReduceDublicates failed, removing client connection <6> 2022-01-12 16:04:09.140 +00:00 [INF] - Unable to re-authenticate iot-host/asaTelemetryBeaconTHReduceDublicates, dropping client connection. <6> 2022-01-12 16:04:09.140 +00:00 [INF] - Removing device connection for device iot-host/asaTelemetryBeaconTHReduceDublicates with removeCloudConnection flag ‘True’. <4> 2022-01-12 16:04:09.140 +00:00 [WRN] - Closing connection for device: iot-host/asaTelemetryBeaconTHReduceDublicates, Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device iot-host/asaTelemetryBeaconTHReduceDublicates., <6> 2022-01-12 16:04:09.141 +00:00 [INF] - Disposing MessagingServiceClient for device Id iot-host/asaTelemetryBeaconTHReduceDublicates because of exception - Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device iot-host/asaTelemetryBeaconTHReduceDublicates. <6> 2022-01-12 16:04:09.141 +00:00 [INF] - Closing device proxy for device Id iot-host/asaTelemetryBeaconTHReduceDublicates <6> 2022-01-12 16:04:09.142 +00:00 [INF] - Closing receiver in cloud proxy 0b34a846-a048-4787-aec2-b0e44dfd4f85 for iot-host/asaTelemetryBeaconTHReduceDublicates <6> 2022-01-12 16:04:09.142 +00:00 [INF] - Closed cloud proxy 0b34a846-a048-4787-aec2-b0e44dfd4f85 for iot-host/asaTelemetryBeaconTHReduceDublicates <6> 2022-01-12 16:04:09.142 +00:00 [INF] - Device connection removed for device iot-host/asaTelemetryBeaconTHReduceDublicates <6> 2022-01-12 16:04:09.463 +00:00 [INF] - Updated reported properties for iot-host/$edgeHub <6> 2022-01-12 16:04:09.886 +00:00 [INF] - Request starting HTTP/1.1 GET http://edgehub:9600/metrics
<6> 2022-01-12 16:04:09.906 +00:00 [INF] - Request finished in 20.6671ms 200 text/plain; version=0.0.4; charset=utf-8 <6> 2022-01-12 16:04:10.481 +00:00 [INF] - Client iot-host/asaTelemetryBeaconTHReduceDublicates in device scope authenticated locally.

Additional Information

13.01.2021-support-bundle-4github.zip

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 33 (12 by maintainers)

Most upvoted comments

@ancaantochi I tried to reproduce the issue by simulating a very bad internet connection and the issue appeared again. Thu Feb 10 14:53:58 UTC 2022 -> Set the internet connection to a bad condition Some time later: Issue occured. No upstream messages Fri Feb 11 07:12:28 UTC 2022 -> set the internet connection to a very good condition Still no upstream messages. Fri Feb 11 08:28:21 UTC 2022 -> Still no sign of recovery. Restart edgeHub All messages in the queue were sent at once and afterwards, the upstream messages were sent at the normal speed.

The workbook: image Position 1: set internet connection to very good Position 2: restart edgeHub

Here it is the support bundel with log level set to debug: support-bundle-2022-02-11–08-54-34.zip

Unfortunately, I was still not able to reproduce it yet. Please note that support bundle contains iothub name and modules logs that you might not want expose, please remove any sensitive data. If you are able to reproduce and have support-bundle you can also create a support request to upload support-bundle and reference the github issue: https://docs.microsoft.com/en-us/azure/azure-portal/supportability/how-to-create-azure-support-request

@BenjaminZ Hi, do you have logs with log-level of edgeHub module set to debug while the issue occured? Unfortunately, the issue doesn’t appear again at my test stations after I changed my log-level to debug.

@wabansal I’ve just enabled a device to get the log, hopefully we don’t need to wait for a long time.

@micahl the workbook is really a nice feature. Thanks for the great work. The issue is a little hidden, as all modules seem to be in the connected status and there were no error messages after the edgeHub reestablished connections. Normally the system should work correctly again. But the workbook shows the problem: Message Point 1: the edgeHub module encountered a problem. Point 2: according to the log, the edgeHub module has reestablished connection. Point 3: I restarted the edgeHub module manually. As one can see, between point 2 and 3, the queue size increased constantly and there were no upstream messages. After restarting the edgeHub module, the messages in the queue were sent to the upstream at once and the following upstream messages got transferred at the normal speed.

@bqstony quick sidenote… nice to see you using the metrics and workbook visualizations. jfyi that there are some refinements coming soon that are intended to provide better detail in the table of senders / receivers seen in your screenshot. dunno if they’d give any extra insight into the situation you encountered, but if you’re welcome to give them a quick try. there’s a link in the PR description of the changes.