iotedge: IoT edge runtime with DPS and X.509 always reconfigures modules on start

Expected Behavior

The runtime only reconfigures modules on start if the DPS sub status is DeviceDataMigrated or DeviceDataReset.

Current Behavior

The runtime always reconfigures modules because even for already assigned devices the DPS sub status is always InitialAssignment.

Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - Reprovisioning status InitialAssignment will trigger reconfiguration of modules.

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Use the iotedge script to generate certificates for root, an intermediate and a device
  2. Add the root CA to DPS and verify
  3. Add group enrollment for root CA with static configuration for a single IoT Hub and set handling of re-provisiong to never re-provision
  4. Configure device certificates on device and restart (use AMQP)
  5. Deploy dummy module
  6. Restart iotedge
  7. Check logs

Context (Environment)

Output of iotedge check

iotedge check
Configuration checks
--------------------
√ config.yaml is well-formed - OK
√ config.yaml has well-formed connection string - OK
√ container engine is installed and functional - OK
√ config.yaml has correct hostname - OK
√ config.yaml has correct URIs for daemon mgmt endpoint - OK
√ latest security daemon - OK
√ host time is close to real time - OK
√ container time is close to host time - OK
√ DNS server - OK
√ production readiness: identity certificates expiry - OK
√ production readiness: certificates - OK
√ production readiness: container engine - OK
√ production readiness: logs policy - OK
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
‼ production readiness: Edge Hub's storage directory is persisted on the host filesystem - Warning

Connectivity checks
-------------------
√ host can connect to and perform TLS handshake with DPS endpoint - OK
√ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK
√ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK
√ container on the default network can connect to IoT Hub AMQP port - OK
√ container on the default network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the default network can connect to IoT Hub MQTT port - OK
√ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK
√ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK

23 check(s) succeeded.
2 check(s) raised warnings. Re-run with --verbose for more details.

Device Information

  • Host OS [e.g. Ubuntu 16.04, Ubuntu 18.04, Windows IoT Core]: Raspbian Stretch
  • Architecture [e.g. amd64, arm32, arm64]: arm32
  • Container OS [e.g. Linux containers, Windows containers]: Linux containers

Runtime Versions

  • iotedged [run iotedge version]: 1.0.9.4
  • Edge Agent [image tag (e.g. 1.0.0)]: 1.0
  • Edge Hub [image tag (e.g. 1.0.0)]: 1.0
  • Docker/Moby [run docker version]: 3.0.12+azure

Logs

iotedged logs

Jul 24 10:25:26 device1 systemd[1]: Started Azure IoT Edge daemon.
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Starting Azure IoT Edge Security Daemon
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Version - 1.0.9.4
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Using config file: /etc/iotedge/config.yaml
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Configuring /var/lib/iotedge as the home directory.
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Configuring certificates...
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Configuring the Device CA certificate using "/home/iotedge/certs/device-ca.cert.pem".
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Configuring the Device private key using "/home/iotedge/certs/device-ca.key.pem".
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Configuring the trusted CA certificates using "/home/iotedge/certs/root.cert.pem".
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Finished configuring provisioning environment variables and certificates.
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Initializing hsm...
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Finished initializing hsm.
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Initializing hsm X509 interface...
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Finished initializing hsm X509 interface...
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Provisioning edge device...
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Starting provisioning edge device via X509 provisioning...
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [INFO] - Starting DPS registration with scope_id "xxx", registration_id "dev1"
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [DBUG] - [edgelet_http::client] Empty token source for request PUT xxx/registrations/dev1/register
Jul 24 10:25:27 device1 iotedged[11901]: 2020-07-24T08:25:27Z [DBUG] - [edgelet_http::client] Empty token source for request GET xxx/registrations/dev1/operations/4.2ccd682af5298751.2995178a-0088-4db4-ad05-a8b22b643073
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [DBUG] - [edgelet_http::client] Empty token source for request GET xxx/registrations/dev1/operations/4.2ccd682af5298751.2995178a-0088-4db4-ad05-a8b22b643073
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - DPS registration assigned device "dev1" in hub "xxx.azure-devices.net"
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - Successful DPS provisioning.
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - Finished provisioning edge device.
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - Initializing the module runtime...
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - Initializing module runtime...
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - Using runtime network id azure-iot-edge
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - Successfully initialized module runtime
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - Finished initializing the module runtime.
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - Reprovisioning status InitialAssignment will trigger reconfiguration of modules.
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [DBUG] - [edgelet_docker::runtime] Listing modules...
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [DBUG] - [edgelet_docker::runtime] Successfully listed modules
Jul 24 10:25:37 device1 iotedged[11901]: 2020-07-24T08:25:37Z [INFO] - Removing module module1...

edge-agent logs

2020-07-24 08:26:04 +00:00 Starting Edge Agent
2020-07-24 08:26:05.464 +00:00 Edge Agent Main()
<6> 2020-07-24 08:26:06.882 +00:00 [INF] - Initializing Edge Agent.
<6> 2020-07-24 08:26:08.154 +00:00 [INF] - Version - 1.0.9.4.32971639 (12d55e582cc7ce95c8abfe11eddfbbc938ed6001)
<6> 2020-07-24 08:26:08.603 +00:00 [INF] - Experimental features configuration: {"Enabled":false,"DisableCloudSubscriptions":false,"EnableUploadLogs":false,"EnableGetLogs":false,"EnableMetrics":false}
<6> 2020-07-24 08:26:09.531 +00:00 [INF] - Installing certificates [CN=xxx, O=xxx, C=DE:05/30/2030 08:37:39] to Root
<6> 2020-07-24 08:26:12.379 +00:00 [INF] - Created persistent store at /tmp/edgeAgent
<6> 2020-07-24 08:26:12.922 +00:00 [INF] - Started operation refresh twin config
<6> 2020-07-24 08:26:13.093 +00:00 [INF] - Edge agent attempting to connect to IoT Hub via Amqp_Tcp_Only...
<6> 2020-07-24 08:26:13.362 +00:00 [INF] - Registering request handler RestartModule
<6> 2020-07-24 08:26:18.324 +00:00 [INF] - Edge agent connected to IoT Hub via Amqp_Tcp_Only.
<6> 2020-07-24 08:26:19.079 +00:00 [INF] - Initialized new module client with subscriptions enabled
<4> 2020-07-24 08:26:19.253 +00:00 [WRN] - Empty edge agent config was received. Attempting to read config from backup file (backup.json) instead
<6> 2020-07-24 08:26:19.293 +00:00 [INF] - Edge agent config backup file does not exist - backup.json
<6> 2020-07-24 08:26:19.573 +00:00 [INF] - Obtained Edge agent twin from IoTHub with desired properties version 17 and reported properties version 84.
<6> 2020-07-24 08:26:26.703 +00:00 [INF] - Plan execution started for deployment 17
<6> 2020-07-24 08:26:34.627 +00:00 [INF] - Executing command: "Command Group: (\n  [Create module module1]\n  [Start module module1]\n)"
<6> 2020-07-24 08:26:34.628 +00:00 [INF] - Executing command: "Create module module1"
<6> 2020-07-24 08:26:36.960 +00:00 [INF] - Executing command: "Start module module1"
<6> 2020-07-24 08:26:38.964 +00:00 [INF] - Executing command: "Command Group: (\n  [Create module edgeHub]\n  [Start module edgeHub]\n)"
<6> 2020-07-24 08:26:38.964 +00:00 [INF] - Executing command: "Create module edgeHub"
<6> 2020-07-24 08:26:39.836 +00:00 [INF] - Executing command: "Start module edgeHub"
<6> 2020-07-24 08:26:42.436 +00:00 [INF] - Plan execution ended for deployment 17

Additional Information

I created a group enrollment for my root CA. The enrollment assignment is static configuration for a single selected IoT Hub. I tried setting the “handle on re-provisioning” to “re-provision and migrate data” and also “never re-provision” both with the same result. The allocation policy of DPS is “static configuration”.

According to the DPS API there are only three enum values for substatus (deviceDataMigrated, deviceDataReset and initialAssignment). If I take a look at the edgelet source code those three status are the only ones mapped for x509 provisioning.

If I use the same device certificate with this sample I also only get the InitialAssignment status from DPS even if the IoT Hub didn’t change and the device is already registered to the IoT Hub. If I take a look at the SDK source code there are also only those three enum values.

Because of this the code here will always trigger a reconfiguration because the status is never DeviceDataNotUpdated. If I understood the reasoning behind this correct a status of InitialAssignment shouldn’t reconfigure the modules because there was no IoT Hub configured before. If the InitialAssignment should always trigger a reconfiguration to refresh the module keys why would I need to remove/add and redownload all modules again? I would even accept the remove/add if it didn’t need to redownload all modules because that defeats the whole offline mode (no internet connection available on the device).

I know this might be an issue of DPS but this is a blocker for production use cases because of the offline scenario of edge devices.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 33 (10 by maintainers)

Most upvoted comments

Let’s keep this issue open until it’s resolved. It’s been reported in a couple forums and the current behavior is definitely problematic.

Internal investigation is pointing to an issue with a value returned from DPS. We can link our internal bug to the bug we’re filing on DPS. Once the issue gets resolved we can close this GitHub issue.

@nyanzebra: I’m a bit confused. Does your PR fix this bug? Do you know when this will get released? Can we expect a patch for 1.1 or is it going to be in 1.2?

Hi Everyone, this closing of issue is likely a mistake on my part for accidentally linking the wrong thing for my PR that is for an entirely different feature. This is just a DPS issue, sorry for any confusion from my part! Reopening now 😃

This issue occurs for us as well at least since iotedged 1.0.9 on Debian 10 armhf. Pretty annoying since it adds to the start up time and kills the device when booted offline. From what I understand iotedge should be capable of running without continuous internet connection.

@nyanzebra: I’m a bit confused. Does your PR fix this bug? Do you know when this will get released? Can we expect a patch for 1.1 or is it going to be in 1.2?

I am also wondering when it will get released?

That is correct, the fix is in 1.0.10.2 version of iotedged via https://github.com/Azure/iotedge/commit/23215094cdd11fd5170d42798ebac509b132a8fd.

Note: you also need a new DPS enrollment for this to work since the fix involved changes to the DPS service as well to return a new substatus.

FYI - @ksaye, @florianbader - thanks for your patience while we took care of this one. Please test and let us know if you run into any issues.

@florianbader, you may have noticed that 1.0.10 was released a couple days ago. It does not have the fix to this issue but it should be available soon in a servicing release to 1.0.10. We released 1.0.10 without the fix as it contains some security fixes that couldn’t wait on us completing the fix for the issue discussed in this thread.