iotedge: edgeAgent su: can't set groups: Permission denied

Expected Behavior

edgeAgent restarts

Current Behavior

edgeAgent fails to start with the following log entries:

2023-08-08 17:10:33 Starting Edge Agent 2023-08-08 17:10:33 Creating UID 13622 as edgeagentuser 2023-08-08 17:10:33 Creating storage folder: /tmp/edgeAgent 2023-08-08 17:10:33 Creating backup folder: /tmp/edgeAgent_backup 2023-08-08 17:10:33 Changing ownership of management socket: /var/run/iotedge/mgmt.sock 2023-08-08 17:10:33 Completed necessary setup. Starting Edge Agent. su: can’t set groups: Permission denied

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Upgrade aziot
  2. Restart edgeAgent

Output of iotedge check

Click here


Configuration checks (aziot-identity-service)
---------------------------------------------
√ keyd configuration is well-formed - OK
√ certd configuration is well-formed - OK
√ tpmd configuration is well-formed - OK
√ identityd configuration is well-formed - OK
√ daemon configurations up-to-date with config.toml - OK
√ identityd config toml file specifies a valid hostname - OK
√ aziot-identity-service package is up-to-date - OK
‼ host time is close to reference time - Warning
    Could not query NTP server
√ preloaded certificates are valid - OK
√ keyd is running - OK
√ certd is running - OK
√ identityd is running - OK
√ read all preloaded certificates from the Certificates Service - OK
√ read all preloaded key pairs from the Keys Service - OK
√ check all EST server URLs utilize HTTPS - OK
√ ensure all preloaded certificates match preloaded private keys with the same ID - OK

Connectivity checks (aziot-identity-service)
--------------------------------------------
‼ host can connect to and perform TLS handshake with iothub AMQP port - Warning
    Could not retrieve iothub_hostname from provisioning file.
    Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.
    Since no hostname is provided, all hub connectivity tests will be skipped.
‼ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - Warning
    Could not retrieve iothub_hostname from provisioning file.
    Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.
    Since no hostname is provided, all hub connectivity tests will be skipped.
‼ host can connect to and perform TLS handshake with iothub MQTT port - Warning
    Could not retrieve iothub_hostname from provisioning file.
    Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information.
    Since no hostname is provided, all hub connectivity tests will be skipped.
√ host can connect to and perform TLS handshake with DPS endpoint - OK

Configuration checks
--------------------
√ aziot-edged configuration is well-formed - OK
√ configuration up-to-date with config.toml - OK
√ container engine is installed and functional - OK
× configuration has correct URIs for daemon mgmt endpoint - Error
    Unable to find image 'mcr.microsoft.com/azureiotedge-diagnostics:1.4.16' locally
    1.4.16: Pulling from azureiotedge-diagnostics
    7264a8db6415: Already exists
    3e50ca6a03ad: Already exists
    a17595d9604f: Already exists
    6f33f8ba42f6: Pulling fs layer
    46749a54c878: Pulling fs layer
    6f33f8ba42f6: Download complete
    6f33f8ba42f6: Pull complete
    46749a54c878: Verifying Checksum
    46749a54c878: Download complete
    46749a54c878: Pull complete
    Digest: sha256:874026606a4d5f9ca988fad4e279a5b48e62f354aaaf1f3c6f7f0e68c5df2fab
    Status: Downloaded newer image for mcr.microsoft.com/azureiotedge-diagnostics:1.4.16
    One or more errors occurred. (Permission denied)
√ aziot-edge package is up-to-date - OK
√ container time is close to host time - OK
√ DNS server - OK
√ production readiness: logs policy - OK
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
    The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.
    Data might be lost if the module is deleted or updated.
    Please see https://aka.ms/iotedge-storage-host for best practices.
× production readiness: Edge Hub's storage directory is persisted on the host filesystem - Error
    Could not check current state of edgeHub container
√ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK

Connectivity checks
-------------------
24 check(s) succeeded.
5 check(s) raised warnings. Re-run with --verbose for more details.
2 check(s) raised errors. Re-run with --verbose for more details.
7 check(s) were skipped due to errors from other checks. Re-run with --verbose for more details.

Additional Information

Using UID 1000 for edgeAgent and edgeHub. Not having this issue on all devices, just a couple

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 22 (13 by maintainers)

Most upvoted comments

@PedroBuhigas - Sorry for the delay in responding. When you say you manually removed the group and reinstalled, do you mean you ran: sudo groupdel iotedge? If not, can you try that after the purge and then reinstall?

To answer your question about the necessary permissions, you’ll find them here: https://github.com/Azure/iotedge/blob/b4e7b13342c6896464fbdf706f75ac0e963cf889/edgelet/contrib/debian/postinst

However, I think the userdel error message you see on purge may not be related at all to the edgeAgent permissions error you’re seeing as I see it in my repro attempts as well, but I don’t run into the same edgeAgent permissions error. There seems to be something else on your host that is preventing the exec su command from succeeding in the container. I thought it might be apparmor, but your apparmor logs are empty.

At this point, I’m not sure what the root cause is because all of the logs and checks look fine and I’m unable to repro the issue. Since this is happening on just one out of several hundred devices, is it feasible for you to re-flash/image the OS and then reinstall iotedge on the problematic device?