kubernetes: Creation of large aws-ebs volume with xfs file system fails sporadically
What happened: Creating larger aws-ebs volumes (~2.6TB) with xfs seems to fail sporadically. As a consequence, mounting the volume in to the pod fails:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 11m (x249 over 7h35m) kubelet, ip-10-42-66-162.eu-central-1.compute.internal (combined from similar events): MountVolume.MountDevice failed for volume "pvc-14ceff2e-401b-48eb-9d0d-ea3a2da8f9a5" : failed to mount the volume as "xfs", it already contains unknown data, probably partitions. Mount error: mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-central-1a/vol-01769a536e60bc232 --scope -- mount -t xfs -o defaults /dev/xvdbj /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-central-1a/vol-01769a536e60bc232
Output: Running scope as unit: run-r2c28bced23c548059e73186ac213f30a.scope
mount: /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-central-1a/vol-01769a536e60bc232: wrong fs type, bad option, bad superblock on /dev/xvdbj, missing codepage or helper program, or other error.
Warning FailedMount 2m9s (x169 over 7h35m) kubelet, ip-10-42-66-162.eu-central-1.compute.internal Unable to mount volumes for pod "41c4e262-72f2-426d-a4c7-48eb8295bf56-574b5bdfbb-pcmsd_default(729ff4c5-932e-498e-9a12-4b9c9054f469)": timeout expired waiting for volumes to attach or mount for pod "default"/"41c4e262-72f2-426d-a4c7-48eb8295bf56-574b5bdfbb-pcmsd". list of unmounted volumes=[storage-volume]. list of unattached volumes=[config-volume storage-volume backint-volume hana-ssl-secret]
Output of parted for volume with successful file system creation:
(parted) select /dev/xvdbl
Using /dev/xvdbl
(parted) p
Model: Xen Virtual Block Device (xvd)
Disk /dev/xvdbl: 2792GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:
Number Start End Size File system Flags
1 0.00B 2792GB 2792GB xfs
Output of parted for volume with invalid file system:
(parted) select /dev/xvdbt
Using /dev/xvdbt
(parted) p
Error: /dev/xvdbt: unrecognised disk label
Model: Xen Virtual Block Device (xvd)
Disk /dev/xvdbt: 2792GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:
What you expected to happen: Creation of volume should result in a volume with a file system which can be mounted. Volume creation should fail if file system creation fails.
How to reproduce it (as minimally and precisely as possible):
- Create storage class
kind: StorageClass
metadata:
labels:
name: sample
name: sample
selfLink: /apis/storage.k8s.io/v1/storageclasses/sample
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
parameters:
encrypted: "true"
fsType: xfs
type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
- Create pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
volume.kubernetes.io/selected-node: ip-10-XXX-XXX-XXX.eu-central-1.compute.internal
finalizers:
- kubernetes.io/pvc-protection
labels:
app.kubernetes.io/component: Sample
name: sample
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2600Gi
storageClassName: sample
volumeMode: Filesystem
volumeName: pvc-5558dce7-9070-4b43-b2d9-3b59d4680a22
- Create pod
spec:
name: sample
volumeMounts:
- mountPath: /sample/mounts
name: storage-volume
volumes:
- name: storage-volume
persistentVolumeClaim:
claimName: sample
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version):Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.6", GitCommit:"7015f71e75f670eb9e7ebd4b5749639d42e20079", GitTreeState:"clean", BuildDate:"2019-11-13T11:11:50Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"} - Cloud provider or hardware configuration: self-managed cluster on AWS
- OS (e.g:
cat /etc/os-release):
NAME="SLES"
VERSION="15-SP1"
VERSION_ID="15.1"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp1"
- Kernel (e.g.
uname -a):4.12.14-197.18-default #1 SMP Tue Sep 17 14:26:49 UTC 2019 (d75059b) x86_64 x86_64 x86_64 GNU/Linux - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 31 (9 by maintainers)
I’m leaving this info here for future users. I’ve hit this problem multiple times within us-east-1 specifically. This occurs on encrypted EBS volumes only.
From AWS support EBS team:
We see this issue randomly when provisioning >1TB GP2 volumes. Our current workaround has been to delete a PVC and let kubernetes recreate the PV, this has worked for our use case today. I think operating system choice also plays are part in this issue. The Amazon Linux OS doesn’t seem to have this issue while Flatcar and Ubuntu both report data being present on the volume.
EBS had deployed a fix for the latest generation Nitro instances, so that unwritten blocks on an encrypted EBS volume will no longer return random data. We are actively working to deploy the fix on Xen instances later this year.
Please let us know if you are still running into this issue on Nitro instances.
The fix on Nitro was deployed on July 6th.