karpenter-provider-aws: waitForFirstConsumer PVCs fail to bind ebs gp3 (gp2) PV for newly created node

Version

Karpenter: v0.5.3

Kubernetes: v1.20.7-eks-d88609

Expected Behavior

PV will be bound when statefullset encourages karpenter to scale up a new node

Actual Behavior

when you create statefulset with volumeClaimTemplates for ebs storage which encourages karpenter to scale up a new node. Scheduler doesn’t annotate PVC with volume.kubernetes.io/selected-node and ebs controller doesn’t bound PV with node. but if you annotate PVC manualy, or recreate pod with PVC - PV will bound. It happen only for newly created node which was created for pending pod for current statefulset

$ k get pvc -n asg-test
NAME                              STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS    AGE
default-gp3-encrypted-webapp2-0   Pending                                      gp3-encrypted   9m21s
default-gp3-encrypted-webapp3-0   Pending                                      gp3-encrypted   9m44s
$ k describe pvc -n asg-test  default-gp3-encrypted-webapp3-0
Name:          default-gp3-encrypted-webapp3-0
Namespace:     asg-test
StorageClass:  gp3-encrypted
Status:        Pending
Volume:
Labels:        app=webapp3
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       webapp3-0
Events:
  Type    Reason                Age                 From                         Message
  ----    ------                ----                ----                         -------
  Normal  WaitForFirstConsumer  36s (x42 over 10m)  persistentvolume-controller  waiting for first consumer to be created before binding

Steps to Reproduce the Problem

create statefulset with volumeClaimTemplates for ebs storage

Resource Specs and Logs

Provisioner

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: dedicated-for-clickhouse-m5-large
spec:
  kubeletConfiguration: {}
  labels:
    node.alpha.altinity.com/dedicated: clickhouse
  limits: {}
  provider:
    apiVersion: extensions.karpenter.sh/v1alpha1
    instanceProfile: KarpenterNodeInstanceProfile-asg-test
    kind: AWS
    securityGroupSelector:
      kubernetes.io/cluster/asg-test: '*'
    subnetSelector:
      kubernetes.io/cluster/asg-test: '*'
  requirements:
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - m5.large
  - key: topology.kubernetes.io/zone
    operator: In
    values:
    - us-west-1a
    - us-west-1c
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  taints:
  - effect: NoSchedule
    key: dedicated
    value: clickhouse
  ttlSecondsAfterEmpty: 60

storageClass

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: gp3-encrypted
parameters:
  encrypted: "true"
  fsType: ext4
  type: gp3
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

statefulSet:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: webapp3
  namespace: asg-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: webapp3
  serviceName: webapp3
  template:
    metadata:
      labels:
        app: webapp3
      name: webapp3
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - us-west-1a
      containers:
      - command:
        - /bin/bash
        - -c
        - "while true; do sleep 6000;  done;"
        image: httpd:2.4
        name: webapp3
        volumeMounts:
        - mountPath: /var/lib/webapp3
          name: default-gp3-encrypted
      nodeSelector:
        node.kubernetes.io/instance-type: m5.large
      securityContext:
        fsGroup: 101
        runAsGroup: 101
        runAsUser: 101
      tolerations:
      - effect: NoSchedule
        key: dedicated
        operator: Equal
        value: clickhouse
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      labels:
        app: webapp3
      name: default-gp3-encrypted
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
      storageClassName: gp3-encrypted
      volumeMode: Filesystem

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

Are you using the in-tree provider, or the https://github.com/kubernetes-sigs/aws-ebs-csi-driver? I will attempt to reproduce on Tuesday.

aws-ebs-csi-driver