kubernetes: Failed to mount Azure Disk as a PV when ADE is enabled

/kind bug

What happened: kubernetes failed to mount directory in Azure Disk as a PV when Azure Disk Encryption is enabled in node host.

What you expected to happen: Successfully to mount the disk.

How to reproduce it (as minimally and precisely as possible): I confirmed the issue in AKS (kubernetes 1.9.9) and OpenShift Enterprise 3.9 on Azure. Here is how to reproduce with AKS.

Create a single node AKS.

az group create -n aks-ade -l westeurope
az aks create -g aks-ade -n aks-ade-cluster  --node-count  1

Then enable ADE for created VM by AKS.

AZGROUP=<resource_group_of_created_VM>
az ad app create --display-name aks-ade-vault-app --identifier-uris https://aks-ade-vault-app --password Password
az ad sp create --id <app_id>
az keyvault create -n aks-ade-vault -g $AZGROUP --enabled-for-disk-encryption True
az keyvault set-policy --name aks-ade-vault --spn <app_id> --key-permissions wrapKey  --secret-permissions set
az keyvault key create --vault-name aks-ade-vault --name aks-ade-node1-key --protection software
az vm encryption enable -g $AZGROUP -n <created_VM_name> --aad-client-id <app_id> --aad-client-secret Password --disk-encryption-keyvault aks-ade-vault  --key-encryption-key aks-ade-node1-key  --volume-type DATA

AKS creates the default storage class automatically.

$ kubectl get sc default -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"storage.k8s.io/v1beta1","kind":"StorageClass","metadata":{"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"true"},"labels":{"kubernetes.io/cluster-service":"true"},"name":"default","namespace":""},"parameters":{"kind":"Managed","storageaccounttype":"Standard_LRS"},"provisioner":"kubernetes.io/azure-disk"}
    storageclass.beta.kubernetes.io/is-default-class: "true"
  creationTimestamp: 2018-07-20T06:36:42Z
  labels:
    kubernetes.io/cluster-service: "true"
  name: default
  resourceVersion: "290"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/default
  uid: ...
parameters:
  kind: Managed
  storageaccounttype: Standard_LRS
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete

Then create a PVC and pod.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azure-managed-disk
  annotations:
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
kind: Pod
apiVersion: v1
metadata:
  name: mypod
spec:
  containers:
    - name: myfrontend
      image: nginx
      volumeMounts:
      - mountPath: "/data1"
        name: volume
  volumes:
    - name: volume
      persistentVolumeClaim:
        claimName: azure-managed-disk

Then I could see the failed event.

$ kubectl get event
LAST SEEN   FIRST SEEN   COUNT     NAME                     KIND      SUBOBJECT   TYPE      REASON        SOURCE                              MESSAGE
16m         3h           86        mypod.15430209f8eb86ef   Pod                   Warning   FailedMount   kubelet, aks-nodepool1-nnnnnnn-0   Unable to mount volumes for pod "mypod_default(uuid)": timeout expired waiting for volumes to attach/mount for pod "default"/"mypod". list of unattached/unmounted volumes=[volume]
1m          3h           116       mypod.1543021e8b8c38f3   Pod                   Warning   FailedMount   kubelet, aks-nodepool1-nnnnnnn-0   (combined from similar events): MountVolume.MountDevice failed for volume "pvc-(uuid)" : azureDisk - mountDevice:FormatAndMount failed with mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m3045819910 --scope -- mount -o defaults /dev/disk/by-id/scsi-(uuid) /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m3045819910
Output: Running scope as unit run-r42ea8854f2e04ee7a534ff50f3f858dc.scope.
mount: /dev/sdc is already mounted or /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m3045819910 busy

Anything else we need to know?: I confirmed the issue only on AKS and OpenShift, but I wonder this could happen on kubernetes on Azure with Azure Cloud Provder enabled. I checked the closed issue #57070 . I’m wondering the issue is caused because kubernetes doesn’t consider /dev/sdc has been already mounted as a BEK volume by ADE.

kubenertes excludes disks used by azure as resource and OS root in /dev/disk/azure. However, BEK Volume doesn’t show up under /dev/disk/azure. I expect this is the reason.

https://github.com/kubernetes/kubernetes/blob/v1.9.9/pkg/volume/azure_dd/azure_common_linux.go#L31-L47

Environment:

  • Kubernetes version (use kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.9", GitCommit:"57729ea3d9a1b75f3fc7bbbadc597ba707d47c8a", GitTreeState:"clean", BuildDate:"2018-06-29T01:07:01Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: Azure (AKS)

  • OS (e.g. from /etc/os-release):

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
  • Kernel (e.g. uname -a):
$ uname -a
Linux cc-5e6a90d6-1543821320-96hfn 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: No

  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 22 (12 by maintainers)

Commits related to this issue

Most upvoted comments

@andyrd2405 Thanks, that would be a better solution without changing k8s code. I just verified that add one more udev rule(see below, may require reboot) in /etc/udev/rules.d/66-azure-storage.rules would fix this issue:

ATTRS{device_id}==“?00000000-0000-“, ENV{fabric_name}=“root”, GOTO=“azure_names” ATTRS{device_id}==”?00000000-0001-”, ENV{fabric_name}=“resource”, GOTO=“azure_names” ATTRS{device_id}==“?00000001-0001-*”, ENV{fabric_name}=“BEK”, GOTO=“azure_names”

$ sudo tree /dev/disk/azure
âââ BEK -> ../../sdc
âââ resource -> ../../sdb
âââ resource-part1 -> ../../sdb1
âââ root -> ../../sda
âââ root-part1 -> ../../sda1
âââ scsi1
    âââ lun0 -> ../../../sdd

I will contact with azure team, try to make this rule as default in linux distro.