iotedge: Error while refreshing the service identity

Expected Behavior

I shouldn’t have connection issue when the edgeHub try to refresh the service identity for my Gateway (LBS)

Current Behavior

I meet issue on refreshing the service identity for my Gateway (see logs below). I have a “System.TimeoutException: Operation timed out” error.

Steps to Reproduce

I do not have specific steps to reproduce to head to this issue.

Context (Environment)

Output of iotedge check

Click here

Configuration checks (aziot-identity-service)
---------------------------------------------
√ keyd configuration is well-formed - OK
√ certd configuration is well-formed - OK
√ tpmd configuration is well-formed - OK
√ identityd configuration is well-formed - OK
√ daemon configurations up-to-date with config.toml - OK
√ identityd config toml file specifies a valid hostname - OK
× aziot-identity-service package is up-to-date - Error
    could not query https://aka.ms/latest-aziot-identity-service for latest available version
‼ host time is close to reference time - Warning
    Could not query NTP server
√ preloaded certificates are valid - OK
× keyd is running - Error
    Could not connect to keyd on unix:///run/aziot/keyd.sock
× certd is running - Error
    Could not connect to certd on unix:///run/aziot/certd.sock
× identityd is running - Error
    Could not connect to identityd on unix:///run/aziot/identityd.sock

Connectivity checks (aziot-identity-service)
--------------------------------------------
‼ host can connect to and perform TLS handshake with iothub AMQP port - Warning
    Could not retrieve iothub_hostname from provisioning file.
    Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.    Since no hostname is provided, all hub connectivity tests will be skipped.
‼ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - Warning
    Could not retrieve iothub_hostname from provisioning file.
    Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.    Since no hostname is provided, all hub connectivity tests will be skipped.
‼ host can connect to and perform TLS handshake with iothub MQTT port - Warning
    Could not retrieve iothub_hostname from provisioning file.
    Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.    Since no hostname is provided, all hub connectivity tests will be skipped.
√ host can connect to and perform TLS handshake with DPS endpoint - OK

Configuration checks
--------------------
√ aziot-edged configuration is well-formed - OK
√ configuration up-to-date with config.toml - OK
√ container engine is installed and functional - OK
× configuration has correct URIs for daemon mgmt endpoint - Error
    Unable to find image 'mcr.microsoft.com/azureiotedge-diagnostics:1.4.9' locally
    docker: Error response from daemon: Get "https://mcr.microsoft.com/v2/": context deadline exceeded.
    See 'docker run --help'.
× aziot-edge package is up-to-date - Error
    Error while fetching latest versions of edge components: could not send HTTP request
× container time is close to host time - Error
    Could not query local time inside container
‼ DNS server - Warning
    Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
    Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
    You can ignore this warning if you are setting DNS server per module in the Edge deployment.
‼ production readiness: logs policy - Warning
    Container engine is not configured to rotate module logs which may cause it run out of disk space.
    Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
    You can ignore this warning if you are setting log policy per module in the Edge deployment.
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
    The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.
    Data might be lost if the module is deleted or updated.
    Please see https://aka.ms/iotedge-storage-host for best practices.
‼ production readiness: Edge Hub's storage directory is persisted on the host filesystem - Warning
    The edgeHub module is not configured to persist its /tmp/edgeHub directory on the host filesystem.
    Data might be lost if the module is deleted or updated.
    Please see https://aka.ms/iotedge-storage-host for best practices.
√ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK

Connectivity checks
-------------------
12 check(s) succeeded.
8 check(s) raised warnings. Re-run with --verbose for more details.
7 check(s) raised errors. Re-run with --verbose for more details.
11 check(s) were skipped due to errors from other checks. Re-run with --verbose for more details.

Device Information

  • Host OS: Ubuntu 20.04
  • Architecture: amd64
  • Container OS: Linux containers

Runtime Versions

  • aziot-edged [run iotedge version]: 1.4.9
  • Edge Agent [image tag (e.g. 1.0.0)]: 1.4.2
  • Edge Hub [image tag (e.g. 1.0.0)]: 1.4.2
  • Docker/Moby [run docker version]:
Click here

Client:
 Version:           20.10.23+azure-2
 API version:       1.41
 Go version:        go1.19.6
 Git commit:        715524332ff91d0f9ec5ab2ec95f051456ed1dba
 Built:             Wed Jan 18 20:42:16 UTC 2023
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.22+azure-1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.9
  Git commit:       42c8b314993e5eb3cc2776da0bbe41d5eb4b707b
  Built:            Thu Dec 15 22:17:04 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.18+azure-1
  GitCommit:        2456e983eb9e37e47538f59ea18f2043c9a73640
 runc:
  Version:          1.1.4
  GitCommit:        5fd4c4d144137e991c4acebb2146ab1483a97925
 docker-init:
  Version:          0.19.0
  GitCommit:

Logs

Logs for the last 30min:

Additional Information

I had never saw this issue with those runtime versions:

  • aziot-edged [run iotedge version]: 1.4.9
  • Edge Agent [image tag (e.g. 1.0.0)]: 1.1
  • Edge Hub [image tag (e.g. 1.0.0)]: 1.1

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 22 (11 by maintainers)

Most upvoted comments

Here is a check with verbose option:

iotedge check --verbose
root@carsi4iiotedge1:~# iotedge check --verbose

Configuration checks (aziot-identity-service)
---------------------------------------------
√ keyd configuration is well-formed - OK
√ certd configuration is well-formed - OK
√ tpmd configuration is well-formed - OK
√ identityd configuration is well-formed - OK
√ daemon configurations up-to-date with config.toml - OK
√ identityd config toml file specifies a valid hostname - OK
× aziot-identity-service package is up-to-date - Error
    could not query https://aka.ms/latest-aziot-identity-service for latest available version
        caused by: could not query https://aka.ms/latest-aziot-identity-service for latest available version
        caused by: error trying to connect: tcp connect error: Connection timed out (os error 110)
        caused by: tcp connect error: Connection timed out (os error 110)
        caused by: Connection timed out (os error 110)
‼ host time is close to reference time - Warning
    Could not query NTP server
        caused by: Could not query NTP server
        caused by: could not receive NTP server response: Resource temporarily unavailable (os error 11)
        caused by: Resource temporarily unavailable (os error 11)
√ preloaded certificates are valid - OK
√ keyd is running - OK
√ certd is running - OK
√ identityd is running - OK
√ read all preloaded certificates from the Certificates Service - OK
√ read all preloaded key pairs from the Keys Service - OK
√ check all EST server URLs utilize HTTPS - OK
√ ensure all preloaded certificates match preloaded private keys with the same ID - OK

Connectivity checks (aziot-identity-service)
--------------------------------------------
‼ host can connect to and perform TLS handshake with iothub AMQP port - Warning
    Could not retrieve iothub_hostname from provisioning file.
    Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.
    Since no hostname is provided, all hub connectivity tests will be skipped.
        caused by: Could not retrieve iothub_hostname from provisioning file.
                   Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.
                   Since no hostname is provided, all hub connectivity tests will be skipped.
‼ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - Warning
    Could not retrieve iothub_hostname from provisioning file.
    Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.
    Since no hostname is provided, all hub connectivity tests will be skipped.
        caused by: Could not retrieve iothub_hostname from provisioning file.
                   Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.
                   Since no hostname is provided, all hub connectivity tests will be skipped.
‼ host can connect to and perform TLS handshake with iothub MQTT port - Warning
    Could not retrieve iothub_hostname from provisioning file.
    Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.
    Since no hostname is provided, all hub connectivity tests will be skipped.
        caused by: Could not retrieve iothub_hostname from provisioning file.
                   Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.
                   Since no hostname is provided, all hub connectivity tests will be skipped.
√ host can connect to and perform TLS handshake with DPS endpoint - OK

Configuration checks
--------------------
√ aziot-edged configuration is well-formed - OK
‼ configuration up-to-date with config.toml - Warning
    /etc/aziot/config.toml was modified after edged's config
    You must run 'iotedge config apply' to update edged's config with the latest config.toml
        caused by: /etc/aziot/config.toml was modified after edged's config
                   You must run 'iotedge config apply' to update edged's config with the latest config.toml
√ container engine is installed and functional - OK
× configuration has correct URIs for daemon mgmt endpoint - Error
    Unable to find image 'mcr.microsoft.com/azureiotedge-diagnostics:1.4.9' locally
    docker: Error response from daemon: Get "https://mcr.microsoft.com/v2/": context deadline exceeded (Client.Timeout exceeded while awaiting headers).
    See 'docker run --help'.
        caused by: Unable to find image 'mcr.microsoft.com/azureiotedge-diagnostics:1.4.9' locally
                   docker: Error response from daemon: Get "https://mcr.microsoft.com/v2/": context deadline exceeded (Client.Timeout exceeded while awaiting headers).
                   See 'docker run --help'.
        caused by: docker returned exit status: 125, stderr = Unable to find image 'mcr.microsoft.com/azureiotedge-diagnostics:1.4.9' locally
                   docker: Error response from daemon: Get "https://mcr.microsoft.com/v2/": context deadline exceeded (Client.Timeout exceeded while awaiting headers).
                   See 'docker run --help'.
× aziot-edge package is up-to-date - Error
    Error while fetching latest versions of edge components: could not send HTTP request
        caused by: Error while fetching latest versions of edge components: could not send HTTP request
        caused by: error trying to connect: tcp connect error: Connection timed out (os error 110)
        caused by: tcp connect error: Connection timed out (os error 110)
        caused by: Connection timed out (os error 110)
× container time is close to host time - Error
    Could not query local time inside container
        caused by: Could not query local time inside container
        caused by: docker returned exit status: 125, stderr = Unable to find image 'mcr.microsoft.com/azureiotedge-diagnostics:1.4.9' locally
                   docker: Error response from daemon: Get "https://mcr.microsoft.com/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers).
                   See 'docker run --help'.
‼ DNS server - Warning
    Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
    Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
    You can ignore this warning if you are setting DNS server per module in the Edge deployment.
        caused by: Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
                   Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
                   You can ignore this warning if you are setting DNS server per module in the Edge deployment.
‼ production readiness: logs policy - Warning
    Container engine is not configured to rotate module logs which may cause it run out of disk space.
    Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
    You can ignore this warning if you are setting log policy per module in the Edge deployment.
        caused by: Container engine is not configured to rotate module logs which may cause it run out of disk space.
                   Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
                   You can ignore this warning if you are setting log policy per module in the Edge deployment.
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
    The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.
    Data might be lost if the module is deleted or updated.
    Please see https://aka.ms/iotedge-storage-host for best practices.
        caused by: The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.
                   Data might be lost if the module is deleted or updated.
                   Please see https://aka.ms/iotedge-storage-host for best practices.
‼ production readiness: Edge Hub's storage directory is persisted on the host filesystem - Warning
    The edgeHub module is not configured to persist its /tmp/edgeHub directory on the host filesystem.
    Data might be lost if the module is deleted or updated.
    Please see https://aka.ms/iotedge-storage-host for best practices.
        caused by: The edgeHub module is not configured to persist its /tmp/edgeHub directory on the host filesystem.
                   Data might be lost if the module is deleted or updated.
                   Please see https://aka.ms/iotedge-storage-host for best practices.
‼ Agent image is valid and can be pulled from upstream - Warning
    skipping because of previous failures
√ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK

Connectivity checks
-------------------
‼ container on the default network can connect to upstream AMQP port - Warning
    skipping because of previous failures
‼ container on the default network can connect to upstream HTTPS / WebSockets port - Warning
    skipping because of previous failures
‼ container on the default network can connect to upstream MQTT port - Warning
    skipping because of previous failures
‼ container on the IoT Edge module network can connect to upstream AMQP port - Warning
    skipping because of previous failures
‼ container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - Warning
    skipping because of previous failures
‼ container on the IoT Edge module network can connect to upstream MQTT port - Warning
    skipping because of previous failures
18 check(s) succeeded.
9 check(s) raised warnings.
4 check(s) raised errors.
7 check(s) were skipped due to errors from other checks.

<Sensor Dev EUI> & <Gateway ID> are just here to anonymise the real Dev EUI & Gateways (there is no < nor > in the real ones)

I tried this modification and let you know later in the day if there is any news 😃