WALinuxAgent: walinuxagent fails to start when host is using EC keys [BUG]

We’re seeing the following error messages every 10 seconds on new Ubuntu hosts. Needless to say, none of the VM extensions are running, so we’re unable to create new hosts.

2019/06/11 04:48:55.591593 INFO Daemon WireServer endpoint is not found. Rerun dhcp handler
2019/06/11 04:48:55.592240 INFO Daemon Test for route to 168.63.129.16
2019/06/11 04:48:55.594100 INFO Daemon Route to 168.63.129.16 exists
2019/06/11 04:48:55.595355 INFO Daemon Wire server endpoint:168.63.129.16
2019/06/11 04:48:55.609510 INFO Daemon Fabric preferred wire protocol version:2015-04-05
2019/06/11 04:48:55.610492 INFO Daemon Wire protocol version:2012-11-30
2019/06/11 04:48:55.612152 INFO Daemon Server preferred version:2015-04-05
2019/06/11 04:48:55.949508 ERROR Daemon Command: [/usr/bin/openssl rsa -in /var/lib/waagent/1.prv -pubout 2>/dev/null], return code: [1], result: []
2019/06/11 04:48:55.962315 ERROR Daemon Command: [/usr/bin/openssl rsa -in /var/lib/waagent/2.prv -pubout 2>/dev/null], return code: [1], result: []
2019/06/11 04:48:56.040907 ERROR Daemon Command: [/usr/bin/openssl rsa -in /var/lib/waagent/1.prv -pubout 2>/dev/null], return code: [1], result: []
2019/06/11 04:48:56.053502 ERROR Daemon Command: [/usr/bin/openssl rsa -in /var/lib/waagent/2.prv -pubout 2>/dev/null], return code: [1], result: []
2019/06/11 04:48:56.132301 ERROR Daemon Command: [/usr/bin/openssl rsa -in /var/lib/waagent/1.prv -pubout 2>/dev/null], return code: [1], result: []
2019/06/11 04:48:56.140927 ERROR Daemon Command: [/usr/bin/openssl rsa -in /var/lib/waagent/2.prv -pubout 2>/dev/null], return code: [1], result: []
2019/06/11 04:48:56.169281 ERROR Daemon Exception processing goal state, giving up: ['']
2019/06/11 04:48:56.169792 INFO Daemon WireServer is not responding. Reset endpoint
2019/06/11 04:48:56.171328 INFO Daemon Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2019/06/11 04:48:56.186186 INFO Daemon Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2019/06/11 04:48:56.194682 INFO Daemon Retry detect protocols: retry=7

The change is that the new hosts have ECDSA certs in their cert store, configured via an ARM template.

The use of (non-walinuxagent related) elliptic curve certs appears relevant because the listed command fails:

root@vms0000000:~# /usr/bin/openssl rsa -in /var/lib/waagent/1.prv -pubout

140613926876824:error:0607907F:digital envelope routines:EVP_PKEY_get1_RSA:expecting an rsa key:p_lib.c:279:

but this command succeeds:

root@vms0000000:~# /usr/bin/openssl ec -in /var/lib/waagent/1.prv -pubout
read EC key
writing EC key
-----BEGIN PUBLIC KEY-----
MF..... (valid public key) ...==
-----END PUBLIC KEY-----

The elliptic curve certs are references in the VM Scaleset portion of the arm templates like so:

    {
      "apiVersion": "2018-10-01",
      "type": "Microsoft.Compute/virtualMachineScaleSets",
      "name": "[variables('nt0ScaleSetName')]",
      "location": "[variables('computeLocation')]",
      "identity": {
        "type": "SystemAssigned"
      },
      "properties": {
        "upgradePolicy": {
          "mode": "Automatic"
        },
        "virtualMachineProfile": {
...
          "osProfile": {
...
            "secrets": [
              {
                "sourceVault": {
                  "id": "[variables('keyvaultResourceId')]"
                },
                "vaultCertificates": [
                  {
                    "certificateUrl": "[parameters('cert1Url')]"
                  },
                  {
                    "certificateUrl": "[parameters('cert2Url')]"
                  }
                ]
              }
            ],
...
          },

Where cert1 and cert2 were previously RSA certs.

Distro and WALinuxAgent details:

  • Ubuntu 16.04 (Azure images, plus some cloud-init and extensions in ARM templates)
  • WALinuxAgent version:
root@vms0000000:~# waagent -version
WALinuxAgent-2.2.32.2 running on ubuntu 16.04
Python: 3.5.2
Goal state agent: 2.2.32.2

Note that this scenario should be completely supported, b/c using such ARM templates with Azure KeyVault references is a standard way of deploying certs to VMs, and KeyVault supports ECC certificates (though they still don’t have support in the UI, they work via direct API access). Info here: https://docs.microsoft.com/en-us/azure/key-vault/about-keys-secrets-and-certificates

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 17 (17 by maintainers)

Commits related to this issue

Most upvoted comments

Support for ECDSA certs has not been added to the Agent yet. They can be deployed using the Keyvault extension: https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/key-vault-linux

@narrieta : Please consider wrapping the openssl rsa call with some error handling, so the error is logged and waagent continues the rest of its work. That way if an error occurs creating pfx files for any reason (including an ec private key) it doesn’t abort the remainder of the things that waagent does (eg run extensions).

Right now, the behavior is to abort the rest of waagent if the openssl call fails, and it’s difficult to troubleshoot. This basically means that waagent (and by extension Azure VMs) are unusable if you reference an EC cert in your arm template.

If the remainder of waagent were run (after logging an error), it would be reasonable to make the openssl pkey call to generate the pfx files on a one-off basis, eg a bash script run in an extension. In this case, waagent would still fill the role of downloading the .crt and .prv files from KeyVault.