aws-efs-csi-driver: Volume mount fail after some time

/kind bug

What happened? I have pods setup to use an EFS volume via PV/PVC and it works as expected for the most part, but after usually a few days, pods created for new releases or cronjob start to fail to mount the volume, here is an event log of a failing pod as an example:

Events:
  Type     Reason       Age                   From                                              Message
  ----     ------       ----                  ----                                              -------
  Warning  FailedMount  7m48s (x86 over 21h)  kubelet, ip-10-4-0-98.eu-west-1.compute.internal  MountVolume.SetUp failed for volume "myproject-web-pv" : rpc error: code = Internal desc = Could not mount "fs-1234abcd:/" at "/var/lib/kubelet/pods/5e30c96b-0f9c-11ea-934f-02efb215d860/volumes/kubernetes.io~csi/myproject-web-pv/mount": mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs -o noac,tls fs-1234abcd:/ /var/lib/kubelet/pods/5e30c96b-0f9c-11ea-934f-02efb215d860/volumes/kubernetes.io~csi/myproject-web-pv/mount
Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "aws-efs-csi-dri"
Failed to locate an available port in the range [20049, 20449], try specifying a different port range in /etc/amazon/efs/efs-utils.conf
  Warning  FailedMount  97s (x584 over 22h)  kubelet, ip-10-4-0-98.eu-west-1.compute.internal  Unable to mount volumes for pod "myproject-web-deployment-8c899c749-95mz6_myproject(5e30c96b-0f9c-11ea-934f-02efb215d860)": timeout expired waiting for volumes to attach or mount for pod "myproject"/"myproject-web-deployment-8c899c749-95mz6". list of unmounted volumes=[tmp-files]. list of unattached volumes=[tmp-files default-token-xb47z]

This error happens for every pods on the same node, so at this point the quickest work around I found is to simply drain and remove the faulty node, so that all pods are scheduled on another (or new) node which has the EFS mount working correctly.

Environment

  • Kubernetes version (use kubectl version): Server Version: version.Info{Major:“1”, Minor:“14+”, GitVersion:“v1.14.8-eks-b7174d”, GitCommit:“b7174db5ee0e30c94a0b9899c20ac980c0850fc8”, GitTreeState:“clean”, BuildDate:“2019-10-18T17:56:01Z”, GoVersion:“go1.12.10”, Compiler:“gc”, Platform:“linux/amd64”}
  • Driver version: image v0.2.0 + stable channel manifests

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (10 by maintainers)

Most upvoted comments

please try amazon/aws-efs-csi-driver:latest this contains the latest build