kubernetes: CSI VolumeAttachment slows pod startup time as # concurrent attaches increases
What happened:
A reporter testing their CSI driver at scale noticed issues:
Specifically they reported that the pod startup time is impacted by the number of concurrent CSI volume attachments in progress across a cluster.
According to the reporter, the pod start up time jumps from order of seconds (when there are a few volume attachments happening concurrently), to 1-2 minutes once there are >1300 concurrent volume attachments for pods using those attachments.
Similarly the reporter indicates that volume detach operations jump from order of seconds (when there are a few volume attach/detach operations happening concurrently), to 3-4 minutes once there are >1300 concurrent volume attachments.
The reporter mentioned that the slowness, once it is encountered, does not go away until the number of concurrent volume attaches is reduced below <500.
What you expected to happen:
Ideally, as the number of CSI volume attachments started in parallel increases the time it takes to start pods remains constant. We should identify the bottle necks and remove as many as possible.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
/sig storage /sig scalability /priority important-soon CC @msau42
Environment:
- Kubernetes version (use
kubectl version): - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release): - Kernel (e.g.
uname -a): - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 27 (15 by maintainers)
Commits related to this issue
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volumes. (k8s issue https://github.com/kubernetes/kubernetes/i... — committed to pawanpraka1/zfs-localpv by pawanpraka1 4 years ago
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volume attachment object. (k8s issue https://github.com/kubern... — committed to pawanpraka1/zfs-localpv by pawanpraka1 4 years ago
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volume attachment object. (k8s issue https://github.com/kubern... — committed to pawanpraka1/zfs-localpv by pawanpraka1 4 years ago
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volume attachment object. (k8s issue https://github.com/kubern... — committed to pawanpraka1/zfs-localpv by pawanpraka1 4 years ago
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volume attachment object. (k8s issue https://github.com/kubern... — committed to pawanpraka1/zfs-localpv by pawanpraka1 4 years ago
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volume attachment object. (k8s issue https://github.com/kubern... — committed to pawanpraka1/zfs-localpv by pawanpraka1 4 years ago
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volume attachment object. (k8s issue https://github.com/kubern... — committed to pawanpraka1/zfs-localpv by pawanpraka1 4 years ago
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volume attachment object. (k8s issue https://github.com/kubern... — committed to pawanpraka1/zfs-localpv by pawanpraka1 4 years ago
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volume attachment object. (k8s issue https://github.com/kubern... — committed to openebs/zfs-localpv by pawanpraka1 4 years ago
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volume attachment object. (k8s issue https://github.com/kubern... — committed to pawanpraka1/zfs-localpv by pawanpraka1 4 years ago
- feat(attach): avoid creation of volumeattachment object k8s is very slow in attaching the volumes when dealing with the large number of volume attachment object. (k8s issue https://github.com/kubern... — committed to openebs/zfs-localpv by pawanpraka1 4 years ago
@cduchesne did some great debugging on this. Here is what he found:
VolumeAttachmentFOR EVERY ATTACHED VOLUME on the specified node as part ofVolumesAreAttached(...)to check that it is verified.https://github.com/kubernetes/kubernetes/blob/037751e7ad2cd18db5b4e2a20ba894314c522b15/pkg/volume/csi/csi_attacher.go#L199
For clusters with many nodes/attached volumes, this results in so many calls to fetch
VolumeAttachmentfrom the Kubernetes API server, that the kube-controller-manager starts to get throttled (as reported).If you encounter this, as a short-term work around, try one of the following:
disable-attach-detach-reconcile-syncflag on the kube-controller-manager totrue.external-attacherobject doesn’t periodically update the VolumeAttachment object (https://github.com/kubernetes/kubernetes/issues/79743 to fix that.)attach-detach-reconcile-sync-periodvalue to a longer period (longer then 1 minute).Longer term, plan:
external-attachercan update theVolumeAttachmentobjects periodically in an efficient manner.BulkVerifyVolumesfor CSI or, at least, fix the existingVolumesAreAttached(...)CSI implementation to “list” instead of “get” for everyVolumeAttachmentobject.We are also facing this issue. It is taking almost 3hrs to create Volumeattachment object. Is there any way to solve this?
@davidz627, How LIST_VOLUMES_PUBLISHED_NODES will help here, could you please help me understand the whole work flow of this?
I tried the recent 2.1.1 attacher, and the problem still exists… Also increased the reconsile time to 10 mins Is there any solution now?