karpenter-provider-aws: "inconsistent state error adding volume, StorageClass.storage.k8s.io "nfs" not found, please file an issue"
Version
Karpenter: v0.12.0
Kubernetes: v1.22.9-eks-a64ea69
Expected Behavior
Karpenter works as expected (no issues with nfs volumes)
Actual Behavior
Seeing the following in my karpenter logs:
controller 2022-06-27T06:30:24.907Z ERROR controller.node-state inconsistent state error adding volume, StorageClass.storage.k8s.io "nfs" not found, please file an issue {"commit": "588d4c8", "node": "ip-XXX-XX-XX-XXX.us-west-2.compute.internal"}
Steps to Reproduce the Problem
Resource Specs and Logs
Provisioner:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: workers
spec:
# https://github.com/aws/karpenter/issues/1252#issuecomment-1166894316
labels:
vpc.amazonaws.com/has-trunk-attached: "false"
taints:
- key: purpose
effect: NoSchedule
value: workers
- key: purpose
value: workers
effect: NoExecute
requirements:
- key: "purpose"
operator: In
values: ["inflate-workers", "workers"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot", "on-demand"]
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["t3.medium", "t3.large", "t3.xlarge", "m5n.xlarge", "m6a.large", "c6a.large"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
provider:
instanceProfile: eks-random-strings-redacted
securityGroupSelector:
Name: v2-cert-03-eks-node
subnetSelector:
Name: v2-cert-03-private-us-west-2*
ttlSecondsAfterEmpty: 30
Sample Deployment Template
apiVersion: apps/v1
kind: Deployment
metadata:
name: some-deployment
labels:
app: some-deployment
app-kubecost: dev
spec:
revisionHistoryLimit: 1
replicas:
selector:
matchLabels:
app: some-deployment
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
labels:
app: some-deployment
app-kubecost: dev
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: purpose
operator: In
values:
- workers
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- some-deployment
topologyKey: kubernetes.io/hostname
tolerations:
- key: purpose
operator: Equal
value: workers
effect: NoSchedule
- key: purpose
operator: Equal
value: workers
effect: NoExecute
imagePullSecrets:
- name: uat-workers-dockercred
containers:
- name: worker
image: "REDACTED.dkr.ecr.us-west-2.amazonaws.com/some_repo:some_immutable_tag"
imagePullPolicy: IfNotPresent
env:
- name: VERBOSE
value: "3"
resources:
requests:
cpu: "2"
memory: "2000Mi"
limits:
cpu: "2"
memory: "2000Mi"
volumeMounts:
- name: files
mountPath: /code/.configs.yaml
subPath: configs.yaml
- mountPath: "/protected_path/uat"
name: nfs
volumes:
- name: files
configMap:
name: config-files
- name: nfs
persistentVolumeClaim:
claimName: nfs
Sample PVC tempalte:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs
spec:
volumeName: nfs-{{ .Release.Namespace }}
storageClassName: nfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
Sample PV template:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-{{ .Release.Namespace }}
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
mountOptions:
- nfsvers=4.1
- rsize=1048576
- wsize=1048576
- hard
- timeo=600
- retrans=2
- noresvport
nfs:
server: {{ .Values.nfs.server }}
path: {{ .Values.nfs.path }}
claimRef:
name: nfs
namespace: {{ .Release.Namespace }}
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 22 (10 by maintainers)
Commits related to this issue
- fix: correctly handle static volumes For static volumes, we pull the CSI driver name off of the PV after it's bound instead of from the SC named on the PVC. The SC may not even exist in these cases ... — committed to tzneal/karpenter by tzneal 2 years ago
- fix: correctly handle static volumes For static volumes, we pull the CSI driver name off of the PV after it's bound instead of from the SC named on the PVC. The SC may not even exist in these cases ... — committed to tzneal/karpenter by tzneal 2 years ago
- fix: correctly handle static volumes (#2033) For static volumes, we pull the CSI driver name off of the PV after it's bound instead of from the SC named on the PVC. The SC may not even exist in th... — committed to aws/karpenter-provider-aws by tzneal 2 years ago
Yes, this should work for you. 4c35c0fe3cc13f55f7edba361cb2f5e662ac9867 is the current latest commit in main.
@armenr Can you file a separate issue? It’s difficult to track multiple items in a single issue.