karpenter: Mega Issue: Karpenter doesnt support custom resources requests/limit
Version
Karpenter: v0.10.1
Kubernetes: v1.20.15
Expected Behavior
Karpenter should be able to trigger an autoscale
Actual Behavior
Karpenter isnt able to trigger an autoscale
Steps to Reproduce the Problem
We’re using Karpenter on EKS. We have pods that has custom resource requests/limits in their spec definition - smarter-devices/fuse: 1
. Karpenter seems to not respecting this resource and fails to autoscale and the pod remains to be in pending state
Resource Specs and Logs
Provisioner spec
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
limits:
resources:
cpu: "100"
provider:
launchTemplate: xxxxx
subnetSelector:
xxxxx: xxxxx
requirements:
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: node.kubernetes.io/instance-type
operator: In
values:
- m5.large
- m5.2xlarge
- m5.4xlarge
- m5.8xlarge
- m5.12xlarge
- key: kubernetes.io/arch
operator: In
values:
- amd64
ttlSecondsAfterEmpty: 30
status:
resources:
cpu: "32"
memory: 128830948Ki
pod spec
apiVersion: apps/v1
kind: Deployment
metadata:
name: fuse-test
labels:
app: fuse-test
spec:
replicas: 1
selector:
matchLabels:
name: fuse-test
template:
metadata:
labels:
name: fuse-test
spec:
containers:
- name: fuse-test
image: ubuntu:latest
ports:
- containerPort: 8080
name: web
protocol: TCP
securityContext:
capabilities:
add:
- SYS_ADMIN
resources:
limits:
cpu: 32
memory: 4Gi
smarter-devices/fuse: 1 # Custom resource
requests:
cpu: 32
memory: 2Gi
smarter-devices/fuse: 1 # Custom resource
env:
- name: S3_BUCKET
value: test-s3
- name: S3_REGION
value: eu-west-1
karpenter controller logs:
controller 2022-06-06T15:59:00.499Z ERROR controller no instance type satisfied resources {“cpu”:“32”,“memory”:“2Gi”,“pods”:“1”,“smarter-devices/fuse”:“1”} and requirements kubernetes.io/os In [linux], karpenter.sh/capacity-type In [on-demand], kubernetes.io/hostname In [hostname-placeholder-3403], node.kubernetes.io/instance-type In [m5.12xlarge m5.2xlarge m5.4xlarge m5.8xlarge m5.large], karpenter.sh/provisioner-name In [default], topology.kubernetes.io/zone In [eu-west-1a eu-west-1b], kubernetes.io/arch In [amd64];
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 32
- Comments: 16 (4 by maintainers)
Commits related to this issue
- permit custom device requests Karpenter is negative towards custom device requests it is unaware of, assuming those cannot be scheduled. fixes #1900 This changes the request handling to be scoped o... — committed to o11n/karpenter by universam1 2 years ago
- permit custom device requests Karpenter is negative towards custom device requests it is unaware of, assuming those cannot be scheduled. fixes #1900 This changes the request handling to be scoped o... — committed to o11n/karpenter by universam1 2 years ago
- permit custom device requests Karpenter is negative towards custom device requests it is unaware of, assuming those cannot be scheduled. fixes #1900 This changes the request handling to be scoped o... — committed to o11n/karpenter by universam1 2 years ago
- permit custom device requests Karpenter is negative towards custom device requests it is unaware of, assuming those cannot be scheduled. fixes #1900 This changes the request handling to be scoped o... — committed to o11n/karpenter by universam1 2 years ago
We also need this, for nitro enclaves.
As discussed on slack:
I’m having the same issue with vGPU.
For us this is a blocking issue with Karpenter. Our use case is
fuse
andsnd
devices that are created as custom device resources from smarter device managerAs a simpler workaround @ellistarn @tzneal why not just ignore resources that Karpenter is unaware of? Instead of having to create a configMap as a whitelist, Karpenter could just filter down well-known resources and act upon those, but ignore other resource is has no idea of. It can’t do anything good about those anyway…
Taking this error message:
it looks like Karpenter has all information available of “manageable” resources and those that are not?
I’m having the same issue with hugepages
We’re facing the same issue with KubeVirt. Given that it’s been ongoing for a while, it might be good to consider both a short-term solution to unblock and a long-term solution?
I noticed PR https://github.com/kubernetes-sigs/karpenter/pull/603 mentioning the deprecated Karpenter config map and a Slack conversation started here. As an alternative, I created a fork using the same approach but sourcing configuration from options (arg or environment variable). Would this be an interesting direction to explore? Or is the current state of this issue more “not a priority, maintain your forks until we have a better design / long-term approach for it”?