karpenter-provider-aws: Endless provisioning loop of new node

Version

Karpenter: v0.8.2

Kubernetes: v1.21.5

Context

We are using calico CNI + istio in our clusters. istio-validation is an init container injected in all our pods. When a pod is assigned to a node the first run of istio-validation always fails with the error Init:ContainerStatusUnknown(because init container starts executing before istio-cni-node-xxx is ready) this in turn causes a pod to be rescheduled on another node. I understand that failed init container is our bad, but there might be any other circumstances which causes pod to fail at init stage.

Actual Behavior

karpenter starts provisioning a node with the same capacity to assign this pod to it, despite there’s a lot of empty(with available capacity) nodes provisioned a few minutes ago - ttlSecondsAfterEmpty: 360

Expected Behavior

Karpenter does not start provisioning a new node and assign recreated pod to it, but let default scheduler move this pod onto other nodes with available capacity OR lets restart/recreation of a pod on the same node.

Steps to Reproduce the Problem

Any deployment/sts with init container which completes with exit code > 0

Resource Specs and Logs

No noticeable messages found in the logs

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  annotations:
    helm.sh/hook: post-install,post-upgrade
    helm.sh/resource-policy: keep
  generation: 1
  name: default
spec:
  labels:
    k8s.example.com/cluster: xyz-eks
    k8s.example.com/environment: production
  provider:
    launchTemplate: xyz-eks-karpenter20220401093308003100000003
    subnetSelector:
      Name: us-west-1-prod-vpc-subnet-k8s-us-west-1*
    tags:
      Environment: production
      Name: xyz-eks-default-karpenter-provider
      Team: DevOps
  requirements:
  - key: node.kubernetes.io/instance-type
    operator: NotIn
    values:
    - t3.nano
    - t3.xlarge
    - t3.2xlarge
    - t3.medium
    - t3.micro
    - t3.small
    - t3.large
    - t3a.nano
    - t3a.xlarge
    - t3a.2xlarge
    - t3a.medium
    - t3a.micro
    - t3a.small
    - t3a.large
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - spot
  - key: topology.kubernetes.io/zone
    operator: In
    values:
    - us-west-1b
    - us-west-1c
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  ttlSecondsAfterEmpty: 360
  ttlSecondsUntilExpired: 1209600

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (9 by maintainers)

Most upvoted comments

Labeled for closure due to inactivity in 10 days.

@infestonn Thanks for the info. We are actively investigating not binding pods to nodes. In this case, we would just launch the node and allow kube-scheduler to bind pods after the node has become ready. It should avoid the issue that you are seeing where the pod is bound before the node is ready and initialization fails.