karpenter-provider-aws: waitForFirstConsumer PVCs fail to bind ebs gp3 (gp2) PV for newly created node
Version
Karpenter: v0.5.3
Kubernetes: v1.20.7-eks-d88609
Expected Behavior
PV will be bound when statefullset encourages karpenter to scale up a new node
Actual Behavior
when you create statefulset with volumeClaimTemplates for ebs storage which encourages karpenter to scale up a new node. Scheduler doesn’t annotate PVC with volume.kubernetes.io/selected-node and ebs controller doesn’t bound PV with node. but if you annotate PVC manualy, or recreate pod with PVC - PV will bound. It happen only for newly created node which was created for pending pod for current statefulset
$ k get pvc -n asg-test
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
default-gp3-encrypted-webapp2-0 Pending gp3-encrypted 9m21s
default-gp3-encrypted-webapp3-0 Pending gp3-encrypted 9m44s
$ k describe pvc -n asg-test default-gp3-encrypted-webapp3-0
Name: default-gp3-encrypted-webapp3-0
Namespace: asg-test
StorageClass: gp3-encrypted
Status: Pending
Volume:
Labels: app=webapp3
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: webapp3-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 36s (x42 over 10m) persistentvolume-controller waiting for first consumer to be created before binding
Steps to Reproduce the Problem
create statefulset with volumeClaimTemplates for ebs storage
Resource Specs and Logs
Provisioner
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: dedicated-for-clickhouse-m5-large
spec:
kubeletConfiguration: {}
labels:
node.alpha.altinity.com/dedicated: clickhouse
limits: {}
provider:
apiVersion: extensions.karpenter.sh/v1alpha1
instanceProfile: KarpenterNodeInstanceProfile-asg-test
kind: AWS
securityGroupSelector:
kubernetes.io/cluster/asg-test: '*'
subnetSelector:
kubernetes.io/cluster/asg-test: '*'
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values:
- m5.large
- key: topology.kubernetes.io/zone
operator: In
values:
- us-west-1a
- us-west-1c
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: kubernetes.io/arch
operator: In
values:
- amd64
taints:
- effect: NoSchedule
key: dedicated
value: clickhouse
ttlSecondsAfterEmpty: 60
storageClass
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
name: gp3-encrypted
parameters:
encrypted: "true"
fsType: ext4
type: gp3
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
statefulSet:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: webapp3
namespace: asg-test
spec:
replicas: 1
selector:
matchLabels:
app: webapp3
serviceName: webapp3
template:
metadata:
labels:
app: webapp3
name: webapp3
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-west-1a
containers:
- command:
- /bin/bash
- -c
- "while true; do sleep 6000; done;"
image: httpd:2.4
name: webapp3
volumeMounts:
- mountPath: /var/lib/webapp3
name: default-gp3-encrypted
nodeSelector:
node.kubernetes.io/instance-type: m5.large
securityContext:
fsGroup: 101
runAsGroup: 101
runAsUser: 101
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: clickhouse
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: webapp3
name: default-gp3-encrypted
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: gp3-encrypted
volumeMode: Filesystem
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (7 by maintainers)
Documented here: https://karpenter.sh/preview/tasks/scheduling/#persistent-volume-topology
Aha! This was released in v0.5.4: https://github.com/aws/karpenter/releases/tag/v0.5.4.
aws-ebs-csi-driver