kubernetes: Failed to mount Azure Disk as a PV when ADE is enabled
/kind bug
What happened: kubernetes failed to mount directory in Azure Disk as a PV when Azure Disk Encryption is enabled in node host.
What you expected to happen: Successfully to mount the disk.
How to reproduce it (as minimally and precisely as possible): I confirmed the issue in AKS (kubernetes 1.9.9) and OpenShift Enterprise 3.9 on Azure. Here is how to reproduce with AKS.
Create a single node AKS.
az group create -n aks-ade -l westeurope
az aks create -g aks-ade -n aks-ade-cluster --node-count 1
Then enable ADE for created VM by AKS.
AZGROUP=<resource_group_of_created_VM>
az ad app create --display-name aks-ade-vault-app --identifier-uris https://aks-ade-vault-app --password Password
az ad sp create --id <app_id>
az keyvault create -n aks-ade-vault -g $AZGROUP --enabled-for-disk-encryption True
az keyvault set-policy --name aks-ade-vault --spn <app_id> --key-permissions wrapKey --secret-permissions set
az keyvault key create --vault-name aks-ade-vault --name aks-ade-node1-key --protection software
az vm encryption enable -g $AZGROUP -n <created_VM_name> --aad-client-id <app_id> --aad-client-secret Password --disk-encryption-keyvault aks-ade-vault --key-encryption-key aks-ade-node1-key --volume-type DATA
AKS creates the default storage class automatically.
$ kubectl get sc default -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"storage.k8s.io/v1beta1","kind":"StorageClass","metadata":{"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"true"},"labels":{"kubernetes.io/cluster-service":"true"},"name":"default","namespace":""},"parameters":{"kind":"Managed","storageaccounttype":"Standard_LRS"},"provisioner":"kubernetes.io/azure-disk"}
storageclass.beta.kubernetes.io/is-default-class: "true"
creationTimestamp: 2018-07-20T06:36:42Z
labels:
kubernetes.io/cluster-service: "true"
name: default
resourceVersion: "290"
selfLink: /apis/storage.k8s.io/v1/storageclasses/default
uid: ...
parameters:
kind: Managed
storageaccounttype: Standard_LRS
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete
Then create a PVC and pod.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: azure-managed-disk
annotations:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
kind: Pod
apiVersion: v1
metadata:
name: mypod
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/data1"
name: volume
volumes:
- name: volume
persistentVolumeClaim:
claimName: azure-managed-disk
Then I could see the failed event.
$ kubectl get event
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
16m 3h 86 mypod.15430209f8eb86ef Pod Warning FailedMount kubelet, aks-nodepool1-nnnnnnn-0 Unable to mount volumes for pod "mypod_default(uuid)": timeout expired waiting for volumes to attach/mount for pod "default"/"mypod". list of unattached/unmounted volumes=[volume]
1m 3h 116 mypod.1543021e8b8c38f3 Pod Warning FailedMount kubelet, aks-nodepool1-nnnnnnn-0 (combined from similar events): MountVolume.MountDevice failed for volume "pvc-(uuid)" : azureDisk - mountDevice:FormatAndMount failed with mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m3045819910 --scope -- mount -o defaults /dev/disk/by-id/scsi-(uuid) /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m3045819910
Output: Running scope as unit run-r42ea8854f2e04ee7a534ff50f3f858dc.scope.
mount: /dev/sdc is already mounted or /var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/m3045819910 busy
Anything else we need to know?:
I confirmed the issue only on AKS and OpenShift, but I wonder this could happen on kubernetes on Azure with Azure Cloud Provder enabled.
I checked the closed issue #57070 . I’m wondering the issue is caused because kubernetes doesn’t consider /dev/sdc
has been already mounted as a BEK volume by ADE.
kubenertes excludes disks used by azure as resource and OS root in /dev/disk/azure. However, BEK Volume doesn’t show up under /dev/disk/azure. I expect this is the reason.
Environment:
- Kubernetes version (use
kubectl version
):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.9", GitCommit:"57729ea3d9a1b75f3fc7bbbadc597ba707d47c8a", GitTreeState:"clean", BuildDate:"2018-06-29T01:07:01Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
-
Cloud provider or hardware configuration: Azure (AKS)
-
OS (e.g. from /etc/os-release):
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
- Kernel (e.g.
uname -a
):
$ uname -a
Linux cc-5e6a90d6-1543821320-96hfn 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
-
Install tools: No
-
Others:
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 22 (12 by maintainers)
Commits related to this issue
- Add a UDEV rule in azure disk encryption on Linux Recently we are trying to do azure disk encryption on Ubuntu & RedHat and found that a UDEV rule is missing, the new UDEV rule is like following in ... — committed to andyzhangx/WALinuxAgent by andyzhangx 6 years ago
- Add a UDEV rule in azure disk encryption on Linux (#1287) Recently we are trying to do azure disk encryption on Ubuntu & RedHat and found that a UDEV rule is missing, the new UDEV rule is like follow... — committed to Azure/WALinuxAgent by andyzhangx 6 years ago
@andyrd2405 Thanks, that would be a better solution without changing k8s code. I just verified that add one more udev rule(see below, may require reboot) in
/etc/udev/rules.d/66-azure-storage.rules
would fix this issue:ATTRS{device_id}==“?00000000-0000-“, ENV{fabric_name}=“root”, GOTO=“azure_names” ATTRS{device_id}==”?00000000-0001-”, ENV{fabric_name}=“resource”, GOTO=“azure_names” ATTRS{device_id}==“?00000001-0001-*”, ENV{fabric_name}=“BEK”, GOTO=“azure_names”
I will contact with azure team, try to make this rule as default in linux distro.