iotedge: IoTedge release 1.0.10 custom certificates edgeAgent Failes
Expected Behavior
We are upgrading the IoT edge runtime on some of our devices from 1.0.9.5
to 1.0.10
and everything should work like before where the device can onboard through DPS (with certificates) and running with production certificates
Current Behavior
Currently edgeAgent is not starting up because of an Hsm failure. We use custom production certificates which are valid according to iotedge check
. DPS seems to work as expected. Downgrading to version 1.0.9.5
or using the self generated test certificates by Iot Edge solves the problem.
Steps to Reproduce
Provide a detailed set of steps to reproduce the bug.
- Install lasted version of IoT edge on Raspbian (sudo install -y iotedge)
- Install production certificates
- Edit config.yaml to use the production certificates
- Restart iotedge
Context (Environment)
Output of iotedge check
Click here
Configuration checks
--------------------
√ config.yaml is well-formed - OK
√ config.yaml has well-formed connection string - OK
√ container engine is installed and functional - OK
√ config.yaml has correct hostname - OK
√ config.yaml has correct URIs for daemon mgmt endpoint - OK
√ latest security daemon - OK
√ host time is close to real time - OK
√ container time is close to host time - OK
√ DNS server - OK
√ production readiness: identity certificates expiry - OK
√ production readiness: certificates - OK
√ production readiness: container engine - OK
√ production readiness: logs policy - OK
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.
Data might be lost if the module is deleted or updated.
Please see https://aka.ms/iotedge-storage-host for best practices.
× production readiness: Edge Hub's storage directory is persisted on the host filesystem - Error
Could not check current state of edgeHub container
Connectivity checks
-------------------
√ host can connect to and perform TLS handshake with DPS endpoint - OK
√ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK
√ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK
√ container on the default network can connect to IoT Hub AMQP port - OK
√ container on the default network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the default network can connect to IoT Hub MQTT port - OK
√ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK
√ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK
23 check(s) succeeded.
1 check(s) raised warnings. Re-run with --verbose for more details.
1 check(s) raised errors. Re-run with --verbose for more details.
Device Information
- Host OS: Raspbian stretch
- Architecture: arm64
- Container OS: Linux containers
Runtime Versions
- iotedged: 1.0.10-1
- Edge Agent: 1.0.10
- Edge Hub: never deployed
- Docker/Moby : moby-engine (3.0.7)
Logs
sudo iotedge list
NAME STATUS DESCRIPTION CONFIG
edgeAgent failed Failed (1) 43 seconds ago mcr.microsoft.com/azureiotedge-agent:1.0
iotedged logs
-- Logs begin at Thu 2020-10-22 11:01:35 BST. --
Oct 22 11:07:29 dg-p061701-200917-009 iotedged[766]: 2020-10-22T10:07:29Z [ERR!] (/project/edgelet/hsm-sys/azure-iot-hs m-c/src/certificate_info.c:parse_cer<3>2020-10-22T10:07:29Z [ERR!] - Internal server error: Could not get trust bundle
Oct 22 11:07:29 dg-p061701-200917-009 iotedged[766]: caused by: An error occurred getting the certificate
Oct 22 11:07:29 dg-p061701-200917-009 iotedged[766]: caused by: HSM failure
Oct 22 11:07:29 dg-p061701-200917-009 iotedged[766]: caused by: HSM API returned an invalid null response
Oct 22 11:07:29 dg-p061701-200917-009 iotedged[766]: 2020-10-22T10:07:29Z [INFO] - [work] - - - [2020-10-22 10:07:29.63 7351901 UTC] "GET /trust-bundle?api-version=2019-01-30 HTTP/1.1" 500 Internal Server Error 178 "-" "-" auth_id(-)
Oct 22 11:07:32 dg-p061701-200917-009 iotedged[766]: 2020-10-22T10:07:32Z [ERR!] - Internal server error: Could not get trust bundle
Oct 22 11:07:32 dg-p061701-200917-009 iotedged[766]: caused by: An error occurred getting the certificate
Oct 22 11:07:32 dg-p061701-200917-009 iotedged[766]: caused by: HSM failure
Oct 22 11:07:32 dg-p061701-200917-009 iotedged[766]: caused by: HSM API returned an invalid null response
Oct 22 11:07:32 dg-p061701-200917-009 iotedged[766]: 2020-10-22T10:07:32Z [INFO] - [work] - - - [2020-10-22 10:07:32.67 9744492 UTC] "GET /trust-bundle?api-version=2019-01-30 HTTP/1.1" 500 Internal Server Error 178 "-" "-" auth_id(-)
-- Logs begin at Thu 2020-10-22 11:01:35 BST. --
Oct 22 11:11:12 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:12Z [INFO] - Configuring the Device private key using "/etc/iotedge/certs/Device_CA.key.pem".
Oct 22 11:11:12 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:12Z [INFO] - Configuring the trusted CA certificates using "/etc/iotedge/certs/Root_CA.pem".
Oct 22 11:11:12 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:12Z [INFO] - Finished configuring provisioning environment variables and certificates.
Oct 22 11:11:12 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:12Z [INFO] - Initializing hsm...
Oct 22 11:11:12 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:12Z [INFO] - Finished initializing hsm.
Oct 22 11:11:12 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:12Z [INFO] - Initializing hsm X509 interface...
Oct 22 11:11:12 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:12Z [INFO] - Finished initializing hsm X509 interface...
Oct 22 11:11:12 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:12Z [INFO] - Provisioning edge device...
Oct 22 11:11:12 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:12Z [INFO] - Starting provisioning edge device via X509 provisioning...
Oct 22 11:11:12 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:12Z [INFO] - Starting DPS registration with scope_id "0ne0014273D", registration_id "DG-P061701-200917-009"
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - DPS registration assigned device "DG-P061701-200917-009" in hub "vd-prd-iothub.azure-devices.net"
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Successful DPS provisioning.
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Finished provisioning edge device.
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Initializing the module runtime...
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Initializing module runtime...
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Using runtime network id azure-iot-edge
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Successfully initialized module runtime
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Finished initializing the module runtime.
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Stopping all modules...
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Stopping module edgeAgent...
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [WARN] - Could not stop module edgeAgent
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [WARN] - caused by: Target of operation already in this state
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Finished stopping modules.
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Reprovisioning status InitialAssignment will trigger reconfiguration of modules.
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Removing module edgeAgent...
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Successfully removed module edgeAgent
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Detecting if configuration file has changed...
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - No change to configuration file detected.
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Edge issuer CA expiration date: 2025-10-21T09:50:47Z
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Obtaining workload CA succeeded.
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Starting management API...
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Starting workload API...
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Starting watchdog with 60 second frequency...
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Listening on fd://iotedge.mgmt.socket/ with 1 thread for management API.
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Listening on fd://iotedge.socket/ with 1 thread for workload API.
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Checking edge runtime status
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Creating and starting edge runtime module edgeAgent
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Updating identity for module $edgeAgent
Oct 22 11:11:23 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:23Z [INFO] - Pulling image mcr.microsoft.com/azureiotedge-agent:1.0...
Oct 22 11:11:24 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:24Z [INFO] - Successfully pulled image mcr.microsoft.com/azureiotedge-agent:1.0
Oct 22 11:11:24 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:24Z [INFO] - Creating module edgeAgent...
Oct 22 11:11:24 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:24Z [INFO] - Successfully created module edgeAgent
Oct 22 11:11:24 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:24Z [INFO] - Starting module edgeAgent...
Oct 22 11:11:26 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:26Z [INFO] - Successfully started module edgeAgent
Oct 22 11:11:28 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:28Z [ERR!] - Internal server error: Could not get trust bundle
Oct 22 11:11:28 dg-p061701-200917-009 iotedged[7464]: caused by: An error occurred getting the certificate
Oct 22 11:11:28 dg-p061701-200917-009 iotedged[7464]: caused by: HSM failure
Oct 22 11:11:28 dg-p061701-200917-009 iotedged[7464]: caused by: HSM API returned an invalid null response
Oct 22 11:11:28 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:28Z [INFO] - [work] - - - [2020-10-22 10:11:28.393531343 UTC] "GET /trust-bundle?api-version=2019-01-30 HTTP/1.1" 500 Internal Server Error 178 "-" "-" auth_id(-)
Oct 22 11:11:28 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:28Z [ERR!] - Internal server error: Could not get trust bundle
Oct 22 11:11:28 dg-p061701-200917-009 iotedged[7464]: caused by: An error occurred getting the certificate
Oct 22 11:11:28 dg-p061701-200917-009 iotedged[7464]: caused by: HSM failure
Oct 22 11:11:28 dg-p061701-200917-009 iotedged[7464]: caused by: HSM API returned an invalid null response
Oct 22 11:11:28 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:28Z [INFO] - [work] - - - [2020-10-22 10:11:28.590992263 UTC] "GET /trust-bundle?api-version=2019-01-30 HTTP/1.1" 500 Internal Server Error 178 "-" "-" auth_id(-)
Oct 22 11:11:31 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:31Z [ERR!] - Internal server error: Could not get trust bundle
Oct 22 11:11:31 dg-p061701-200917-009 iotedged[7464]: caused by: An error occurred getting the certificate
Oct 22 11:11:31 dg-p061701-200917-009 iotedged[7464]: caused by: HSM failure
Oct 22 11:11:31 dg-p061701-200917-009 iotedged[7464]: caused by: HSM API returned an invalid null response
Oct 22 11:11:31 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:31Z [INFO] - [work] - - - [2020-10-22 10:11:31.674695998 UTC] "GET /trust-bundle?api-version=2019-01-30 HTTP/1.1" 500 Internal Server Error 178 "-" "-" auth_id(-)
Oct 22 11:11:34 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:34Z [ERR!] - Internal server error: Could not get trust bundle
Oct 22 11:11:34 dg-p061701-200917-009 iotedged[7464]: caused by: An error occurred getting the certificate
Oct 22 11:11:34 dg-p061701-200917-009 iotedged[7464]: caused by: HSM failure
Oct 22 11:11:34 dg-p061701-200917-009 iotedged[7464]: caused by: HSM API returned an invalid null response
Oct 22 11:11:34 dg-p061701-200917-009 iotedged[7464]: 2020-10-22T10:11:34Z [INFO] - [work] - - - [2020-10-22 10:11:34.704802018 UTC] "GET /trust-bundle?api-version=2019-01-30 HTTP/1.1" 500 Internal Server Error 178 "-" "-" auth_id(-)
Additional Information
In the end we found some more details on the error
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 47 (23 by maintainers)
I have found some time to test this. Indeed when using the test certificates and including my own root certificate, it is indeed all working fine but when using my own device device certificate it fails.
I again also run with my own certificates and received some more details on the error. Again very strange because on version 9 the same certificates are working an on 10 they are not.
Detials
Should be “your trusted cert expires post 2037”
@and-rewsmith, just wanted to say thank you for the information here. Learning this this week rather than a week or two later makes a huge difference on the project I’m working on.
@kpm-at-hfi yep
@kpm-at-hfi Unfortunately the support ticket contains confidential information from the reporting customer, so we aren’t allowed to share details.
Yes indeed the device CA cert. In my lasted test I completely left out the DPS setup and just used the connection string.
I will check to maybe open a support ticket to share further information.
Yes we first tried it on 1.0.10 and then we had the issue with EdgeAgent throwing the error. Moving back to 1.0.9.5 solved it.
Well the dots in the paths are just hiding where the certificate is for the post here. Normally the absolute path is in the file for every certificate.
We selected to keep the config.yaml file, have check with the new version and doesn’t seem to have changed. So just to be clear, we installed the device with the latest version of IoT Edge as we have always done and since 1.0.10 things don’t start properly even if we deliver all the same settings as before (config, certificates, etc.). Without making any changes to settings and just downgrading to 1.0.9.5, keeping the original config file, and everything worked again as it should.
Config.yaml