karpenter-provider-aws: "inconsistent state error adding volume, StorageClass.storage.k8s.io "nfs" not found, please file an issue"
Version
Karpenter: v0.12.0
Kubernetes: v1.22.9-eks-a64ea69
Expected Behavior
Karpenter works as expected (no issues with nfs volumes)
Actual Behavior
Seeing the following in my karpenter logs:
controller 2022-06-27T06:30:24.907Z    ERROR    controller.node-state    inconsistent state error adding volume, StorageClass.storage.k8s.io "nfs" not found, please file an issue    {"commit": "588d4c8", "node": "ip-XXX-XX-XX-XXX.us-west-2.compute.internal"}
Steps to Reproduce the Problem
Resource Specs and Logs
Provisioner:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: workers
spec:
  # https://github.com/aws/karpenter/issues/1252#issuecomment-1166894316
  labels:
    vpc.amazonaws.com/has-trunk-attached: "false"
  taints:
  - key: purpose
    effect: NoSchedule
    value: workers
  - key: purpose
    value: workers
    effect: NoExecute
  requirements:
    - key: "purpose"
      operator: In
      values: ["inflate-workers", "workers"]
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: ["spot", "on-demand"]
    - key: "node.kubernetes.io/instance-type"
      operator: In
      values: ["t3.medium", "t3.large", "t3.xlarge", "m5n.xlarge", "m6a.large", "c6a.large"]
    - key: "kubernetes.io/arch"
      operator: In
      values: ["amd64"]
  provider:
    instanceProfile: eks-random-strings-redacted
    securityGroupSelector:
      Name: v2-cert-03-eks-node
    subnetSelector:
      Name: v2-cert-03-private-us-west-2*
  ttlSecondsAfterEmpty: 30
Sample Deployment Template
apiVersion: apps/v1
kind: Deployment
metadata:
  name: some-deployment
  labels:
    app: some-deployment
    app-kubecost: dev
spec:
  revisionHistoryLimit: 1
  replicas:
  selector:
    matchLabels:
      app: some-deployment
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
      labels:
        app: some-deployment
        app-kubecost: dev
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: purpose
                    operator: In
                    values:
                      - workers
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - some-deployment
                topologyKey: kubernetes.io/hostname
      tolerations:
        - key: purpose
          operator: Equal
          value: workers
          effect: NoSchedule
        - key: purpose
          operator: Equal
          value: workers
          effect: NoExecute
      imagePullSecrets:
        - name: uat-workers-dockercred
      containers:
        - name: worker
          image: "REDACTED.dkr.ecr.us-west-2.amazonaws.com/some_repo:some_immutable_tag"
          imagePullPolicy: IfNotPresent
          env:
            - name: VERBOSE
              value: "3"
          resources:
            requests:
              cpu: "2"
              memory: "2000Mi"
            limits:
              cpu: "2"
              memory: "2000Mi"
          volumeMounts:
            - name: files
              mountPath: /code/.configs.yaml
              subPath: configs.yaml
            - mountPath: "/protected_path/uat"
              name: nfs
      volumes:
        - name: files
          configMap:
            name: config-files
        - name: nfs
          persistentVolumeClaim:
            claimName: nfs
Sample PVC tempalte:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs
spec:
  volumeName: nfs-{{ .Release.Namespace }}
  storageClassName: nfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
Sample PV template:
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-{{ .Release.Namespace }}
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  mountOptions:
    - nfsvers=4.1
    - rsize=1048576
    - wsize=1048576
    - hard
    - timeo=600
    - retrans=2
    - noresvport
  nfs:
    server: {{ .Values.nfs.server }}
    path: {{ .Values.nfs.path }}
  claimRef:
    name: nfs
    namespace: {{ .Release.Namespace }}
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 22 (10 by maintainers)
Commits related to this issue
- fix: correctly handle static volumes For static volumes, we pull the CSI driver name off of the PV after it's bound instead of from the SC named on the PVC. The SC may not even exist in these cases ... — committed to tzneal/karpenter by tzneal 2 years ago
- fix: correctly handle static volumes For static volumes, we pull the CSI driver name off of the PV after it's bound instead of from the SC named on the PVC. The SC may not even exist in these cases ... — committed to tzneal/karpenter by tzneal 2 years ago
- fix: correctly handle static volumes (#2033) For static volumes, we pull the CSI driver name off of the PV after it's bound instead of from the SC named on the PVC. The SC may not even exist in th... — committed to aws/karpenter-provider-aws by tzneal 2 years ago
Yes, this should work for you. 4c35c0fe3cc13f55f7edba361cb2f5e662ac9867 is the current latest commit in main.
@armenr Can you file a separate issue? It’s difficult to track multiple items in a single issue.