karpenter-provider-aws: toplogySpreadConstraint selects AZs from an incorrect region.

Version

Karpenter Version: v0.27.0

Kubernetes Version: v1.24.0

Expected Behavior

We have a deployment with topologySpreadConstraints. The deployment is running an EKS cluster in eu-west-1 region. The expected behavior is for pods to spread across all AZs in the eu-west-1 region.

topologySpreadConstraints:
  - labelSelector:
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule

Actual Behavior

Some of the pods are stuck in a pending state, in the karpenter event, I noticed that karpenter tries to schedule the pod on the node that is running in us-east-1a, and because the cluster is deployed in eu-west-1 region, there are no us-east-1x AZs available.

Warning  FailedScheduling  55m (x8 over 55m)  karpenter          (combined from similar events): Failed to schedule pod, 
incompatible with provisioner "default", no instance type satisfied resources 
{"cpu":"3100m","memory":"4319304Ki","pods":"1"} and requirements karpenter.k8s.aws/instance-size In [12xlarge 2xlarge 
4xlarge 8xlarge large and 2 others], topology.kubernetes.io/zone In [us-east-1a], kubernetes.io/os In [linux]

Steps to Reproduce the Problem

I was able to reproduce the issue, but only in eu-west-1 region; the same deployment works in us-east-1. I have also verified that all nodes in the cluster are in eu-west-1 Screenshot 2023-04-12 at 12 56 33

Here is a full deployment spec that I used to reproduce the issue.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: topology-test
  namespace: devops
spec:
  replicas: 5
  selector:
    matchLabels:
      app.kubernetes.io/name: topology-test
  template:
    metadata:
      labels:
        app.kubernetes.io/name: topology-test
    spec:
      tolerations:
      - key: dedicated
        operator: Equal
        value: tmp
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: dedicated
                  operator: In
                  values:
                  - tmp
      containers:
      - name: app
        command: ["sleep"]
        args: [ "10000" ]
        image: busybox:latest
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: topology-test
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: topology-test
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule

AWSNodeTemplate and Provisioner

# Source: karpenter-provisioners/templates/tmp-provisioner.yaml
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: tmp
spec:
  subnetSelector:
    aws-ids: "subnet-0eadc682250f4a7e7,subnet-0e172be0e12ea345f,subnet-0fca6ed4430619a37"
  securityGroupSelector:
    Name: k8s.nodes.eks
---
# Source: karpenter-provisioners/templates/tmp-provisioner.yaml
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: tmp
spec:
  labels:
    dedicated: tmp
  taints:
    - key: dedicated
      effect: NoSchedule
      value: tmp
  providerRef:
     name: tmp

As a result

3/5 pods are running, the two remaining one complain about us-east-1 AZ

topology-test-768fc9fdc7-5mrdt                      1/1     Running             0          106s
topology-test-768fc9fdc7-8g4xc                      1/1     Running   0          106s
topology-test-768fc9fdc7-hnzhq                      0/1     Pending             0          106s
topology-test-768fc9fdc7-rw8n6                      1/1     Running             0          106s
topology-test-768fc9fdc7-skcq7                       0/1     Pending             0          106s

Here is the log message, which is the same as the original one.

  Warning  FailedScheduling  95s (x15 over 2m35s)  karpenter          (combined from similar events): Failed to schedule pod,
   incompatible with provisioner "tmp", no instance type 
   satisfied resources {"pods":"1"} and requirements karpenter.sh/provisioner-name In [tmp], kubernetes.io/os In [linux], 
   karpenter.sh/capacity-type In [on-demand], karpenter.k8s.aws/instance-category In [c m r], karpenter.k8s.aws/instance-
   generation Exists >2, dedicated In [tmp], kubernetes.io/arch In [amd64], topology.kubernetes.io/zone In [us-east-1a]

As you can see, once again, the pod has a requirement to get deployed into us-east-1a zone topology.kubernetes.io/zone In [us-east-1a], which is not expected behavior.

Resource Specs and Logs

Provisioner

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  labels:
    argocd.argoproj.io/instance: karpenter-provisioners-eu-west-1
  name: default
spec:
  consolidation:
    enabled: true
  labels:
    dedicated: k8s.nodes
  providerRef:
    name: default
  requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - spot
    - on-demand
  - key: karpenter.k8s.aws/instance-category
    operator: In
    values:
    - c
    - m
    - r
  - key: karpenter.k8s.aws/instance-size
    operator: In
    values:
    - small
    - large
    - xlarge
    - 2xlarge
    - 4xlarge
    - 8xlarge
    - 12xlarge
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  - key: kubernetes.io/os
    operator: In
    values:
    - linux
  ttlSecondsUntilExpired: 2592000

Karpenter logs

2023-04-11T09:04:02.514Z        ERROR   controller.provisioner  Could not schedule pod, incompatible with provisioner "dmz", did not tolerate dedicated=dmz:NoSchedule; incompatible with provisioner "on-demand", did not tolerate dedicated=on-demand:NoSchedule; incompatible with provisioner "default", no instance type satisfied resources {"cpu":"3100m","memory":"4319304Ki","pods":"1"} and requirements kubernetes.io/os In [linux], karpenter.sh/provisioner-name In [default], karpenter.k8s.aws/instance-size In [12xlarge 2xlarge 4xlarge 8xlarge large and 2 others], dedicated In [k8s.nodes], karpenter.sh/capacity-type In [on-demand spot], karpenter.k8s.aws/instance-category In [c m r], kubernetes.io/arch In [amd64], topology.kubernetes.io/zone In [us-east-1c]   {"commit": "dc3af1a", "pod": "<pod-ns>/<pod-name>""}
2023-04-11T09:04:02.515Z        ERROR   controller.provisioner  Could not schedule pod, incompatible with provisioner "dmz", did not tolerate dedicated=dmz:NoSchedule; incompatible with provisioner "on-demand", did not tolerate dedicated=on-demand:NoSchedule; incompatible with provisioner "default", no instance type satisfied resources {"cpu":"3100m","memory":"4319304Ki","pods":"1"} and requirements karpenter.sh/provisioner-name In [default], karpenter.k8s.aws/instance-category In [c m r], topology.kubernetes.io/zone In [us-east-1a], karpenter.sh/capacity-type In [on-demand spot], karpenter.k8s.aws/instance-size In [12xlarge 2xlarge 4xlarge 8xlarge large and 2 others], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux], dedicated In [k8s.nodes]   {"commit": "dc3af1a", "pod": "<pod-ns>/<pod-name>"}

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

About this issue

Original URL
State: closed
Created a year ago
Reactions: 2
Comments: 17 (9 by maintainers)

Most upvoted comments

Did you try removing the name selector and specifying the correct subnets by ID?

tzneal on Apr 12, 2023