karpenter: Mega Issue: Karpenter doesnt support custom resources requests/limit

Version

Karpenter: v0.10.1

Kubernetes: v1.20.15

Expected Behavior

Karpenter should be able to trigger an autoscale

Actual Behavior

Karpenter isnt able to trigger an autoscale

Steps to Reproduce the Problem

We’re using Karpenter on EKS. We have pods that has custom resource requests/limits in their spec definition - smarter-devices/fuse: 1. Karpenter seems to not respecting this resource and fails to autoscale and the pod remains to be in pending state

Resource Specs and Logs

Provisioner spec

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  limits:
    resources:
      cpu: "100"
  provider:
    launchTemplate: xxxxx
    subnetSelector:
      xxxxx: xxxxx
  requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - m5.large
    - m5.2xlarge
    - m5.4xlarge
    - m5.8xlarge
    - m5.12xlarge
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  ttlSecondsAfterEmpty: 30
status:
  resources:
    cpu: "32"
    memory: 128830948Ki

pod spec

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fuse-test
  labels:
    app: fuse-test
spec:
  replicas: 1
  selector:
    matchLabels:
      name: fuse-test
  template:
    metadata:
      labels:
        name: fuse-test
    spec:
      containers:
      - name: fuse-test
        image: ubuntu:latest
        ports:
          - containerPort: 8080
            name: web
            protocol: TCP
        securityContext:
          capabilities:
            add:
              - SYS_ADMIN
        resources:
          limits:
            cpu: 32
            memory: 4Gi
            smarter-devices/fuse: 1  # Custom resource
          requests:
            cpu: 32
            memory: 2Gi
            smarter-devices/fuse: 1  # Custom resource
        env:
        - name: S3_BUCKET
          value: test-s3
        - name: S3_REGION
          value: eu-west-1

karpenter controller logs:

controller 2022-06-06T15:59:00.499Z ERROR controller no instance type satisfied resources {“cpu”:“32”,“memory”:“2Gi”,“pods”:“1”,“smarter-devices/fuse”:“1”} and requirements kubernetes.io/os In [linux], karpenter.sh/capacity-type In [on-demand], kubernetes.io/hostname In [hostname-placeholder-3403], node.kubernetes.io/instance-type In [m5.12xlarge m5.2xlarge m5.4xlarge m5.8xlarge m5.large], karpenter.sh/provisioner-name In [default], topology.kubernetes.io/zone In [eu-west-1a eu-west-1b], kubernetes.io/arch In [amd64];

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 32
  • Comments: 16 (4 by maintainers)

Commits related to this issue

Most upvoted comments

We also need this, for nitro enclaves.

As discussed on slack:

@Todd Neal and I were recently discussing a mechanism to allow users to define extended resources that karpenter isn’t aware of. Right now, we are aware of the extended resources on specific EC2 instance types, which is how we binpack them. One option would be to enable users to define a configmap of [{instancetype, provisioner, extendedresource}] that karpenter could use for binpacking.

I’m having the same issue with vGPU.

For us this is a blocking issue with Karpenter. Our use case is fuse and snd devices that are created as custom device resources from smarter device manager

As a simpler workaround @ellistarn @tzneal why not just ignore resources that Karpenter is unaware of? Instead of having to create a configMap as a whitelist, Karpenter could just filter down well-known resources and act upon those, but ignore other resource is has no idea of. It can’t do anything good about those anyway…

Taking this error message:

Failed to provision new node, incompatible with provisioner "default", no instance type satisfied resources {....smarter-devices/fuse":"2"} ...

it looks like Karpenter has all information available of “manageable” resources and those that are not?

I’m having the same issue with hugepages

We’re facing the same issue with KubeVirt. Given that it’s been ongoing for a while, it might be good to consider both a short-term solution to unblock and a long-term solution?

I noticed PR https://github.com/kubernetes-sigs/karpenter/pull/603 mentioning the deprecated Karpenter config map and a Slack conversation started here. As an alternative, I created a fork using the same approach but sourcing configuration from options (arg or environment variable). Would this be an interesting direction to explore? Or is the current state of this issue more “not a priority, maintain your forks until we have a better design / long-term approach for it”?