karpenter-provider-aws: Race condition with nvidia-device-plugin?
Version
Karpenter: v0.6.4
Kubernetes: v1.21.5-eks-bc4871b
Expected Behavior
I have a pod (NVidia Riva) which has a resource limit of
resources:
limits:
nvidia.com/gpu: "1"
When I scale the deployment to 1 Karpenter should provision a new node. The node will eventually go Ready and the nvidia-device-plugin daemonset will start a pod, which will add the nvidia.com/cpu
resource to the node.
The Riva pod will then start correctly on the new node.
Actual Behavior
Karpenter provisions a new GPU node correctly, however it also immediately schedules the Riva pod on to this new node.
At this point some sort of timer must start counting down, because approx 60 seconds later the Riva pod transitions state to OutOfnvidia.com/gpu
and is now stuck.
This is because the node must start up enough to load the nvidia-device-plugin daemonset (in order to get the the nvidia.com/gpu
resource added to the node) however this takes longer than 60 seconds usually.
These stuck pods remain and seem to be blocking the nvidia.com/gpu resource as at this point Karpenter attempts to provision another new node, leading to a loop which can result in multiple nodes being repeatedly launched over and over again.
riva-riva-api-768b77d764-74k4m 0/2 OutOfnvidia.com/gpu 0 5m55s
riva-riva-api-768b77d764-9gr5b 0/2 OutOfnvidia.com/gpu 0 4m42s
riva-riva-api-768b77d764-gcngv 0/2 OutOfnvidia.com/gpu 0 102s
riva-riva-api-768b77d764-qhc4k 0/2 OutOfnvidia.com/gpu 0 2m45s
riva-riva-api-768b77d764-thnp9 0/2 OutOfnvidia.com/gpu 0 3m48s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 5m41s (x2 over 5m43s) default-scheduler 0/11 nodes are available: 1 node(s) had taint {karpenter.sh/not-ready: }, that the pod didn't tolerate, 1 node(s) had taint {ocrOnly: true}, that the pod didn't tolerate, 1 node(s) had taint {tritonOnly: true}, that the pod didn't tolerate, 4 Insufficient nvidia.com/gpu, 4 node(s) didn't match Pod's node affinity/selector.
Warning OutOfnvidia.com/gpu 4m52s kubelet Node didn't have enough resource: nvidia.com/gpu, requested: 1, used: 0, capacity: 0
Steps to Reproduce the Problem
Install the Riva helm chart and scale deployment to 1.
Resource Specs and Logs
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: gpu-riva
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["g4dn.xlarge", "g5.xlarge"]
taints:
- key: rivaOnly
value: "true"
effect: NoExecute
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
limits:
resources:
nvidia.com/gpu: 10
provider:
launchTemplate: karpenter-gpu
subnetSelector:
Tier: private
ttlSecondsAfterEmpty: 300
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
meta.helm.sh/release-name: riva
meta.helm.sh/release-namespace: api-dev
creationTimestamp: "2022-02-16T08:48:28Z"
generation: 10
labels:
app: riva-api
app.kubernetes.io/managed-by: Helm
chart: riva-api-1.8.0-beta
heritage: Helm
release: riva
name: riva-riva-api
namespace: api-dev
resourceVersion: "112778449"
uid: ba419874-da09-4915-bc1f-5310655ca24c
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: riva-api
release: riva
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: riva-api
release: riva
spec:
containers:
- args:
- --asr_service=true
- --nlp_service=false
- --tts_service=false
command:
- /opt/riva/bin/start-riva
env:
- name: TRTIS_MODEL_STORE
value: /data/models
- name: LD_PRELOAD
image: nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-server
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/grpc_health_probe
- -addr=:50051
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: riva-speech-api
ports:
- containerPort: 50051
name: speech-grpc
protocol: TCP
- containerPort: 8000
name: http
protocol: TCP
- containerPort: 8001
name: grpc
protocol: TCP
- containerPort: 8002
name: metrics
protocol: TCP
readinessProbe:
exec:
command:
- /bin/grpc_health_probe
- -addr=:50051
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
nvidia.com/gpu: "1"
startupProbe:
exec:
command:
- /bin/grpc_health_probe
- -addr=:50051
failureThreshold: 12
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data/
name: workdir
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: imagepullsecret
initContainers:
- command:
- download_and_deploy_ngc_models
- -d
- nvidia/riva/rmir_asr_citrinet_1024_asrset3p0_streaming:1.8.0-beta
- nvidia/riva/rmir_asr_citrinet_1024_asrset3p0_offline:1.8.0-beta
env:
- name: NGC_CLI_ORG
value: nvidia
- name: NGC_CLI_API_KEY
valueFrom:
secretKeyRef:
key: apikey
name: modelpullsecret
- name: MODEL_DEPLOY_KEY
valueFrom:
secretKeyRef:
key: key
name: riva-model-deploy-key
image: nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-servicemaker
imagePullPolicy: IfNotPresent
name: riva-model-init
resources:
limits:
nvidia.com/gpu: "1"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /rmir
name: artifact-volume
- mountPath: /data/
name: workdir
nodeSelector:
karpenter.sh/provisioner-name: gpu-riva
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: rivaOnly
operator: Exists
volumes:
- hostPath:
path: /data/riva/
type: DirectoryOrCreate
name: artifact-volume
- hostPath:
path: /data/riva
type: DirectoryOrCreate
name: workdir
2022-03-03T14:13:51.627Z INFO controller.provisioning Batched 1 pods in 1.00054383s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:13:51.632Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:13:53.629Z INFO controller.provisioning Launched instance: i-0530447a932454fa5, hostname: ip-10-32-28-237.eu-west-1.compute.internal, type: g5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:13:53.645Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-32-28-237.eu-west-1.compute.internal {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:13:53.645Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:13:54.001Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-30-147.eu-west-1.compute.internal"}
2022-03-03T14:13:54.021Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-30-147.eu-west-1.compute.internal"}
2022-03-03T14:13:54.189Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-30-147.eu-west-1.compute.internal"}
2022-03-03T14:14:06.944Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-20-129.eu-west-1.compute.internal"}
2022-03-03T14:15:02.754Z INFO controller.provisioning Batched 1 pods in 1.000881401s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:15:03.000Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-17-79.eu-west-1.compute.internal"}
2022-03-03T14:15:03.025Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-17-79.eu-west-1.compute.internal"}
2022-03-03T14:15:03.069Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g4dn.xlarge g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:15:03.236Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-17-79.eu-west-1.compute.internal"}
2022-03-03T14:15:05.330Z INFO controller.provisioning Launched instance: i-0c21c14033f402684, hostname: ip-10-32-21-191.eu-west-1.compute.internal, type: g5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:15:05.345Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-32-21-191.eu-west-1.compute.internal {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:15:05.345Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:15:19.663Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-28-237.eu-west-1.compute.internal"}
2022-03-03T14:16:09.355Z INFO controller.provisioning Batched 1 pods in 1.000995209s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:16:09.571Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:16:11.563Z INFO controller.provisioning Launched instance: i-0c4bc55b215429d34, hostname: ip-10-32-24-230.eu-west-1.compute.internal, type: g5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:16:11.581Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-32-24-230.eu-west-1.compute.internal {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:16:11.581Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:16:18.001Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-21-84.eu-west-1.compute.internal"}
2022-03-03T14:16:18.019Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-21-84.eu-west-1.compute.internal"}
2022-03-03T14:16:18.184Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-21-84.eu-west-1.compute.internal"}
2022-03-03T14:16:25.459Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-21-191.eu-west-1.compute.internal"}
2022-03-03T14:17:18.598Z INFO controller.provisioning Batched 1 pods in 1.001033588s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:17:18.759Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:17:20.674Z INFO controller.provisioning Launched instance: i-06485eeeab5691aa9, hostname: ip-10-32-22-237.eu-west-1.compute.internal, type: g5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:17:20.690Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-32-22-237.eu-west-1.compute.internal {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:17:20.690Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:17:24.001Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-19-108.eu-west-1.compute.internal"}
2022-03-03T14:17:24.018Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-19-108.eu-west-1.compute.internal"}
2022-03-03T14:17:24.207Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-19-108.eu-west-1.compute.internal"}
2022-03-03T14:17:34.697Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-24-230.eu-west-1.compute.internal"}
2022-03-03T14:18:07.001Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-25-189.eu-west-1.compute.internal"}
2022-03-03T14:18:07.018Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-25-189.eu-west-1.compute.internal"}
2022-03-03T14:18:07.247Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-25-189.eu-west-1.compute.internal"}
2022-03-03T14:18:27.593Z INFO controller.provisioning Batched 1 pods in 1.000639619s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:18:27.756Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:18:29.778Z INFO controller.provisioning Launched instance: i-07cc1f656002a5b3f, hostname: ip-10-32-17-13.eu-west-1.compute.internal, type: g5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:18:29.793Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-32-17-13.eu-west-1.compute.internal {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:18:29.793Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:18:43.750Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-22-237.eu-west-1.compute.internal"}
2022-03-03T14:19:06.001Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-20-129.eu-west-1.compute.internal"}
2022-03-03T14:19:06.021Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-20-129.eu-west-1.compute.internal"}
2022-03-03T14:19:06.297Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-20-129.eu-west-1.compute.internal"}
2022-03-03T14:19:38.911Z INFO controller.provisioning Batched 1 pods in 1.001071464s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:19:39.106Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g4dn.xlarge g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:19:40.981Z INFO controller.provisioning Launched instance: i-090afc78e782f4a76, hostname: ip-10-32-19-214.eu-west-1.compute.internal, type: g5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:19:41.012Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-32-19-214.eu-west-1.compute.internal {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:19:41.012Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:19:56.032Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-17-13.eu-west-1.compute.internal"}
2022-03-03T14:20:19.000Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-28-237.eu-west-1.compute.internal"}
2022-03-03T14:20:19.030Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-28-237.eu-west-1.compute.internal"}
2022-03-03T14:20:19.250Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-28-237.eu-west-1.compute.internal"}
2022-03-03T14:20:51.736Z INFO controller.provisioning Batched 1 pods in 1.000044154s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:20:52.130Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:20:54.102Z INFO controller.provisioning Launched instance: i-00ee81c90d469d369, hostname: ip-10-32-20-177.eu-west-1.compute.internal, type: g5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:20:54.117Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-32-20-177.eu-west-1.compute.internal {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:20:54.118Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:21:08.840Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-19-214.eu-west-1.compute.internal"}
2022-03-03T14:21:25.001Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-21-191.eu-west-1.compute.internal"}
2022-03-03T14:21:25.020Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-21-191.eu-west-1.compute.internal"}
2022-03-03T14:21:25.304Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-21-191.eu-west-1.compute.internal"}
2022-03-03T14:21:45.696Z INFO controller.provisioning Batched 1 pods in 1.000961501s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:21:45.702Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:21:47.857Z INFO controller.provisioning Launched instance: i-02e9b47ca243fc22d, hostname: ip-10-32-30-183.eu-west-1.compute.internal, type: g5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:21:47.872Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-32-30-183.eu-west-1.compute.internal {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:21:47.872Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:22:00.966Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-20-177.eu-west-1.compute.internal"}
2022-03-03T14:22:00.984Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-20-177.eu-west-1.compute.internal"}
2022-03-03T14:22:34.001Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-24-230.eu-west-1.compute.internal"}
2022-03-03T14:22:34.024Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-24-230.eu-west-1.compute.internal"}
2022-03-03T14:22:34.220Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-24-230.eu-west-1.compute.internal"}
2022-03-03T14:22:48.141Z INFO controller.provisioning Batched 1 pods in 1.000828756s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:22:48.147Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g4dn.xlarge g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:22:50.060Z INFO controller.provisioning Launched instance: i-01f2158e05faacc43, hostname: ip-10-32-27-174.eu-west-1.compute.internal, type: g5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:22:50.084Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-32-27-174.eu-west-1.compute.internal {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:22:50.084Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:23:03.505Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-30-183.eu-west-1.compute.internal"}
2022-03-03T14:23:43.001Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-22-237.eu-west-1.compute.internal"}
2022-03-03T14:23:43.019Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-22-237.eu-west-1.compute.internal"}
2022-03-03T14:23:43.233Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-22-237.eu-west-1.compute.internal"}
2022-03-03T14:23:51.036Z INFO controller.provisioning Batched 1 pods in 1.000458171s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:23:51.042Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:23:53.060Z INFO controller.provisioning Launched instance: i-09faa781565355cff, hostname: ip-10-32-21-129.eu-west-1.compute.internal, type: g5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:23:53.077Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-32-21-129.eu-west-1.compute.internal {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:23:53.078Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:24:07.137Z INFO controller.node Added TTL to empty node {"commit": "82ea63b", "node": "ip-10-32-27-174.eu-west-1.compute.internal"}
2022-03-03T14:24:56.001Z INFO controller.node Triggering termination after 5m0s for empty node {"commit": "82ea63b", "node": "ip-10-32-17-13.eu-west-1.compute.internal"}
2022-03-03T14:24:56.019Z INFO controller.termination Cordoned node {"commit": "82ea63b", "node": "ip-10-32-17-13.eu-west-1.compute.internal"}
2022-03-03T14:24:56.210Z INFO controller.termination Deleted node {"commit": "82ea63b", "node": "ip-10-32-17-13.eu-west-1.compute.internal"}
2022-03-03T14:25:02.152Z INFO controller.provisioning Batched 1 pods in 1.000620896s {"commit": "82ea63b", "provisioner": "gpu-riva"}
2022-03-03T14:25:02.159Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g5.xlarge] {"commit": "82ea63b", "provisioner": "gpu-riva"}
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 30 (22 by maintainers)
I was able to reproduce this by adding
nvidia.com/gpu: "1"
under spec.resources.limitsAfter that running
kubectl get node -l karpenter.sh/provisioner-name -o json -w | jq -r ".status.capacity"
shows that"nvidia.com/gpu"
is being set to0
.It appears that the kubelet is zeroing out the extended resource as indicated in the logs:
Here’s where that happens: https://github.com/kubernetes/kubernetes/blob/39c76ba2edeadb84a115cc3fbd9204a2177f1c28/pkg/kubelet/kubelet_node_status.go#L178
We are taking a deeper look at this, we will update the ticket with next steps.
Sorry, I don’t know when this will be released but it is a priority. There is a PR up at https://github.com/aws/karpenter/pull/1837 which combines eliminating pod binding with also eliminating node creation which solves another issue.
@tzneal This snapshot is working well for us.
nvidia.com/gpu
is being reset correctly, so nodes are being reused. We’re no longer seeing the intermittentOutOfnvidia.com/gpu
when the pod runs the first time on a gpu node. 🎉This will sometimes work, the problem is that occasionally kubelet will zero-out the extended resources on startup, setting the number of GPUs to zero. I am actively working on a solution for this.
We are considering the option of not-binding, and that should help with this and a few other issues.
Any update about this issue?
Yep! Have submitted a PR.