karpenter-provider-aws: Karpenter nodes stuck in "NotReady" status and fail to join EKS cluster

Version

Karpenter: v0.16.1

Kubernetes: v1.22.9

Expected Behavior

The pods should be schedule on the nodes spun up by Karpenter

Actual Behavior

No pods are being scheduled onto the Karpenter nodes. Those nodes are stuck in a NotReady status and show the same error related to a failed lease: Lease: Failed to get lease: leases.coordination.k8s.io "ip-10-50-34-118.ec2.internal" not found

This is a continuation of the issue #2286 which was closed prematurely and the Karpenter nodes still present the same issue reported in there.

The EKS cluster nodes are using bottlerocket OS and cilium is the CNI. We are using terraform for provisioning the EKS cluster.

Cilium version: 1.12.1 Bottlerocket AMI ID: ami-0ab0c02538ad82487

Steps to Reproduce the Problem

The Provisioner and AWSNodeTemplates are below

# kubectl describe provisioner default
Name:         default
Namespace:
Labels:       cartax.io/argo-instance=eks-rabdalla-dev-karpenter
Annotations:  <none>
API Version:  karpenter.sh/v1alpha5
Kind:         Provisioner
Metadata:
  Creation Timestamp:  2022-08-30T22:15:14Z
  Generation:          5
  Managed Fields:
    API Version:  karpenter.sh/v1alpha5
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
        f:labels:
          .:
          f:cartax.io/argo-instance:
      f:spec:
        .:
        f:limits:
          .:
          f:resources:
            .:
            f:cpu:
        f:providerRef:
          .:
          f:name:
        f:requirements:
        f:startupTaints:
        f:ttlSecondsAfterEmpty:
        f:ttlSecondsUntilExpired:
    Manager:         argocd-application-controller
    Operation:       Update
    Time:            2022-08-31T18:35:37Z
  Resource Version:  941471
  UID:               49f8c266-4502-4a7c-8346-070e7b268cd2
Spec:
  Limits:
    Resources:
      Cpu:  1k
  Provider Ref:
    Name:  default
  Requirements:
    Key:       kubernetes.io/arch
    Operator:  In
    Values:
      amd64
    Key:       topology.kubernetes.io/zone
    Operator:  In
    Values:
      us-east-1a
      us-east-1b
      us-east-1c
    Key:       karpenter.sh/capacity-type
    Operator:  In
    Values:
      on-demand
    Key:       node.kubernetes.io/instance-type
    Operator:  In
    Values:
      t3.xlarge
  Startup Taints:
    Effect:                   NoExecute
    Key:                      node.cilium.io/agent-not-ready
    Value:                    true
  Ttl Seconds After Empty:    60
  Ttl Seconds Until Expired:  2592000
Status:
Events:  <none>
---

# kubectl describe awsnodetemplate default
Name:         default
Namespace:
Labels:       cartax.io/argo-instance=eks-rabdalla-dev-karpenter
Annotations:  <none>
API Version:  karpenter.k8s.aws/v1alpha1
Kind:         AWSNodeTemplate
Metadata:
  Creation Timestamp:  2022-08-30T17:52:51Z
  Generation:          7
  Managed Fields:
    API Version:  karpenter.k8s.aws/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
        f:labels:
          .:
          f:cartax.io/argo-instance:
      f:spec:
        .:
        f:amiFamily:
        f:amiSelector:
          .:
          f:aws-ids:
        f:blockDeviceMappings:
        f:instanceProfile:
        f:securityGroupSelector:
          .:
          f:karpenter.sh/discovery:
        f:subnetSelector:
          .:
          f:karpenter.sh/discovery:
        f:userData:
    Manager:         argocd-application-controller
    Operation:       Update
    Time:            2022-08-30T19:21:15Z
  Resource Version:  886319
  UID:               2f1734ae-f2c7-4403-ad18-bfd0c726b736
Spec:
  Ami Family:  Bottlerocket
  Ami Selector:
    Aws - Ids:  ami-0ab0c02538ad82487
  Block Device Mappings:
    Device Name:  /dev/xvda
    Ebs:
      Delete On Termination:  true
      Volume Size:            60Gi
      Volume Type:            gp2
  Instance Profile:           KarpenterNodeInstanceProfile-eks-rabdalla-dev
  Security Group Selector:
    karpenter.sh/discovery:  eks-rabdalla-dev
  Subnet Selector:
    karpenter.sh/discovery:  eks-rabdalla-dev
  User Data:                 [settings.kernel]
"lockdown" = "integrity"
[settings.kernel.sysctl]
"net.ipv4.conf.default.rp_filter" = 0
"net.ipv4.conf.all.rp_filter" = 0
"fs.inotify.max_user_watches" = 1048576

Events:  <none>

The IRSA Kubernetes module used is the following:

module "karpenter_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "5.3.1"

  role_name                          = "karpenter-controller-${local.name}"
  attach_karpenter_controller_policy = true

  karpenter_tag_key               = "karpenter.sh/discovery"
  karpenter_controller_cluster_id = var.cluster_id
  karpenter_controller_node_iam_role_arns = [
    module.eks_managed_node_group.iam_role_arn
  ]


  oidc_providers = {
    main = {
      provider_arn               = var.oidc_provider_arn
      namespace_service_accounts = ["karpenter:karpenter"]
    }
  }
  tags = local.tags
}

The final state of the nodes is as follows:

k get nodes
NAME                            STATUS     ROLES    AGE   VERSION
ip-10-50-110-116.ec2.internal   NotReady   <none>   67s
ip-10-50-111-149.ec2.internal   Ready      <none>   26h   v1.22.9-eks-0857b39
ip-10-50-119-64.ec2.internal    NotReady   <none>   67s
ip-10-50-131-21.ec2.internal    Ready      <none>   26h   v1.22.9-eks-0857b39
ip-10-50-158-184.ec2.internal   Ready      <none>   26h   v1.22.9-eks-0857b39
ip-10-50-185-72.ec2.internal    Ready      <none>   26h   v1.22.9-eks-0857b39
ip-10-50-5-178.ec2.internal     Ready      <none>   26h   v1.22.9-eks-0857b39
ip-10-50-52-0.ec2.internal      Ready      <none>   26h   v1.22.9-eks-0857b39
ip-10-50-59-107.ec2.internal    Ready      <none>   26h   v1.22.9-eks-0857b39
ip-10-50-65-77.ec2.internal     Ready      <none>   26h   v1.22.9-eks-0857b39
ip-10-50-72-199.ec2.internal    NotReady   <none>   67s
ip-10-50-83-35.ec2.internal     Ready      <none>   26h   v1.22.9-eks-0857b39

The state of one of the karpenter managed nodes:

k describe node ip-10-50-119-64.ec2.internal
Name:               ip-10-50-119-64.ec2.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    k8s.io/cloud-provider-aws=6f56019d471bf630c9432bfca43855fa
                    karpenter.k8s.aws/instance-category=t
                    karpenter.k8s.aws/instance-cpu=4
                    karpenter.k8s.aws/instance-family=t3
                    karpenter.k8s.aws/instance-generation=3
                    karpenter.k8s.aws/instance-hypervisor=nitro
                    karpenter.k8s.aws/instance-memory=16384
                    karpenter.k8s.aws/instance-pods=58
                    karpenter.k8s.aws/instance-size=xlarge
                    karpenter.sh/capacity-type=on-demand
                    karpenter.sh/provisioner-name=default
                    kubernetes.io/arch=amd64
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=t3.xlarge
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1b
Annotations:        node.alpha.kubernetes.io/ttl: 0
CreationTimestamp:  Wed, 31 Aug 2022 11:43:57 -0700
Taints:             node.cilium.io/agent-not-ready=true:NoExecute
                    node.kubernetes.io/unreachable:NoExecute
                    node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Lease:              Failed to get lease: leases.coordination.k8s.io "ip-10-50-119-64.ec2.internal" not found
Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                   Message
  ----             ------    -----------------                 ------------------                ------                   -------
  Ready            Unknown   Wed, 31 Aug 2022 11:43:57 -0700   Wed, 31 Aug 2022 11:45:01 -0700   NodeStatusNeverUpdated   Kubelet never posted node status.
  MemoryPressure   Unknown   Wed, 31 Aug 2022 11:43:57 -0700   Wed, 31 Aug 2022 11:45:01 -0700   NodeStatusNeverUpdated   Kubelet never posted node status.
  DiskPressure     Unknown   Wed, 31 Aug 2022 11:43:57 -0700   Wed, 31 Aug 2022 11:45:01 -0700   NodeStatusNeverUpdated   Kubelet never posted node status.
  PIDPressure      Unknown   Wed, 31 Aug 2022 11:43:57 -0700   Wed, 31 Aug 2022 11:45:01 -0700   NodeStatusNeverUpdated   Kubelet never posted node status.
Addresses:
System Info:
  Machine ID:
  System UUID:
  Boot ID:
  Kernel Version:
  OS Image:
  Operating System:
  Architecture:
  Container Runtime Version:
  Kubelet Version:
  Kube-Proxy Version:
ProviderID:                   aws:///us-east-1b/i-0dc3c403d33754dd6
Non-terminated Pods:          (0 in total)
  Namespace                   Name    CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----    ------------  ----------  ---------------  -------------  ---
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests  Limits
  --------           --------  ------
  cpu                0 (0%)    0 (0%)
  memory             0 (0%)    0 (0%)
  ephemeral-storage  0 (0%)    0 (0%)
Events:
  Type    Reason          Age    From             Message
  ----    ------          ----   ----             -------
  Normal  RegisteredNode  3m19s  node-controller  Node ip-10-50-119-64.ec2.internal event: Registered Node ip-10-50-119-64.ec2.internal in Controller

Resource Specs and Logs

2022-08-31T18:43:54.826Z	INFO	controller.provisioning	Computed 3 new node(s) will fit 7 pod(s)	{"commit": "b157d45"}
2022-08-31T18:43:54.826Z	INFO	controller.provisioning	Launching node with 3 pods requesting {"cpu":"3285m","memory":"532Mi","pods":"8"} from types t3.xlarge	{"commit": "b157d45", "provisioner": "default"}
2022-08-31T18:43:54.827Z	INFO	controller.provisioning	Launching node with 3 pods requesting {"cpu":"3285m","memory":"532Mi","pods":"8"} from types t3.xlarge	{"commit": "b157d45", "provisioner": "default"}
2022-08-31T18:43:54.827Z	INFO	controller.provisioning	Launching node with 1 pods requesting {"cpu":"1285m","memory":"532Mi","pods":"6"} from types t3.xlarge	{"commit": "b157d45", "provisioner": "default"}
2022-08-31T18:43:55.417Z	DEBUG	controller.provisioning.cloudprovider	Created launch template, Karpenter-eks-rabdalla-dev-10366144926913742821	{"commit": "b157d45", "provisioner": "default"}
2022-08-31T18:43:57.231Z	INFO	controller.provisioning.cloudprovider	Launched instance: i-086ba86d09b25e186, hostname: ip-10-50-110-116.ec2.internal, type: t3.xlarge, zone: us-east-1b, capacityType: on-demand	{"commit": "b157d45", "provisioner": "default"}
2022-08-31T18:43:57.260Z	INFO	controller.provisioning.cloudprovider	Launched instance: i-0dc3c403d33754dd6, hostname: ip-10-50-119-64.ec2.internal, type: t3.xlarge, zone: us-east-1b, capacityType: on-demand	{"commit": "b157d45", "provisioner": "default"}
2022-08-31T18:43:57.282Z	INFO	controller.provisioning.cloudprovider	Launched instance: i-066cf340fe575b474, hostname: ip-10-50-72-199.ec2.internal, type: t3.xlarge, zone: us-east-1b, capacityType: on-demand	{"commit": "b157d45", "provisioner": "default"}
2022-08-31T18:45:56.988Zernal   DEBUG	controller.aws.launchtemplate	Deleted launch template Karpenter-eks-rabdalla-dev-10366144926913742821 (lt-05ced4c5f64da911a)	{"commit": "b157d45"}

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 4
  • Comments: 44 (20 by maintainers)

Most upvoted comments

We are also running cilium and eks. We didn’t had any issues when running the amazon-ec2 images for our nodes also ssm is working.

I tried to reproduce your problem and its repeatable for me.

SSM works with Bottlerocket and Cilium

It seems there is is a configuration issue in your setup. 😃

What we do for our nodes is the following in terraform:

Setting up the eks-irsa-role like this:

resource "aws_iam_instance_profile" "karpenter" {
  name = "KarpenterNodeInstanceProfile-${local.eks_cluster_name}"
  role = module.eks-ng.eks_managed_node_groups["worker"].iam_role_name
}

module "karpenter_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "5.3.0"

  role_name                          = "karpenter-controller-${local.eks_cluster_name}"
  attach_karpenter_controller_policy = true

  karpenter_controller_cluster_id = module.eks-ng.cluster_id
  karpenter_controller_node_iam_role_arns = [
    module.eks-ng.eks_managed_node_groups["worker"].iam_role_arn
  ]

  oidc_providers = {
    ex = {
      provider_arn               = module.eks-ng.oidc_provider_arn
      namespace_service_accounts = ["kube-system:karpenter"]
    }
  }

Important is here the karpenter_controller_node_iam_role_arns there we are referencing the default node role from the eks managed node groups.

In the default managed node group we are passing the required ssm policy to ensure that ssm is working.

  eks_managed_node_group_defaults = {
    ami_type = "AL2_x86_64"
    instance_types = [
      "m5a.large"
    ]

    taints = [
      {
        key    = "node.cilium.io/agent-not-ready"
        value  = "true"
        effect = "NO_EXECUTE"
      }
    ]


    bootstrap_extra_args = "--kubelet-extra-args '--max-pods=100'"
    iam_role_additional_policies = [
      "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
    ]

    block_device_mappings = {
      xvda = {
        device_name = "/dev/xvda"
        ebs = {
          volume_size           = 50
          volume_type           = "gp3"
          iops                  = 1000
          throughput            = 250
          encrypted             = true
          delete_on_termination = true
        }
      }
    }

    post_bootstrap_user_data = <<-EOT
      cd /tmp
      sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
      sudo systemctl enable amazon-ssm-agent
      sudo systemctl start amazon-ssm-agent
    EOT
  }

Check that your nodes have the policy for AmazonSSMManagedInstanceCore than it should work out of the box. Since already mentioned ssm is preinstalled.

Cilium and Bottlerocket Troubleshoot

First of all I can see the same behaviour

ip-10-20-35-219.eu-central-1.compute.internal   NotReady   <none>   22m     v1.23.7-eks-7709a84

I could connect to the instance with ssm without any problems.

aws ssm start-session --target i-03fede25ff7580879

/bin/bash
          Welcome to Bottlerocket's control container!
[ssm-user@control]$ /bin/bash
[ssm-user@control]$ enter-admin-container
Confirming admin container is enabled...
Waiting for admin container to start......
Entering admin container
[root@admin]# sudo sheltie
bash-5.1# journalctl -u kubelet

Interessting Logs from the Kubelet

Sep 11 06:58:55 10.20.35.219 kubelet[1583]: E0911 06:58:55.039163    1583 remote_runtime.go:704] "ExecSync cmd from runtime service failed" err="rpc error: code = Unknown desc = failed to exec in container: failed to create exec \"76a6dcaed6a7de6294437a02e20d6b740d46eadc92d12ce2275a833753051717\": cannot exec in a stopped state: unknown" containerID="beaa01acf2c9874853e99ff42c7608d0098ef9f8b8daf74f2a42c3777f05858e" cmd=[nsenter --target=1 --mount -- /bin/sh -c #!/bin/bash
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o errexit
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o pipefail
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o nounset
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # When running in AWS ENI mode, it's likely that 'aws-node' has
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # had a chance to install SNAT iptables rules. These can result
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # in dropped traffic, so we should attempt to remove them.
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # We do it using a 'postStart' hook since this may need to run
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # for nodes which might have already been init'ed but may still
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # have dangling rules. This is safe because there are no
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # dependencies on anything that is part of the startup script
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # itself, and can be safely run multiple times per node (e.g. in
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # case of a restart).
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: if [[ "$(iptables-save | grep -c AWS-SNAT-CHAIN)" != "0" ]];
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: then
Sep 11 06:58:55 10.20.35.219 kubelet[1583]:     echo 'Deleting iptables rules created by the AWS CNI VPC plugin'
Sep 11 06:58:55 10.20.35.219 kubelet[1583]:     iptables-save | grep -v AWS-SNAT-CHAIN | iptables-restore
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: fi
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: echo 'Done!'
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: ]
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: E0911 06:58:55.039253    1583 kuberuntime_container.go:284] "Failed to execute PostStartHook" err="rpc error: code = Unknown desc = failed to exec in container: failed to create exec \"76a6dcaed6a7de6294437a02e20d6b740d46eadc92d12ce2275a833753051717\": cannot exec in a stopped state: unknown" pod="kube-system/cilium-node-init-9767n" podUID=cd02bb16-c6ea-4978-bd3a-ffafb9566c97 containerName="node-init" containerID="containerd://beaa01acf2c9874853e99ff42c7608d0098ef9f8b8daf74f2a42c3777f05858e"
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: E0911 06:58:55.117006    1583 kuberuntime_manager.go:919] container &Container{Name:node-init,Image:quay.io/cilium/startup-script:d69851597ea019af980891a4628fb36b7880ec26,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{EnvVar{Name:STARTUP_SCRIPT,Value:#!/bin/bash
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o errexit
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o pipefail
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o nounset
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: echo "Link information:"
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: ip link
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: echo "Routing table:"
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: ip route
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: echo "Addressing:"
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: ip -4 a
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: ip -6 a
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: mkdir -p "/tmp/cilium-bootstrap.d"
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: date > "/tmp/cilium-bootstrap.d/cilium-bootstrap-time"
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: echo "Node initialization complete"
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: ,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{cpu: {{100 -3} {<nil>} 100m DecimalSI},memory: {{104857600 0} {<nil>} 100Mi BinarySI},},},VolumeMounts:[]VolumeMount{VolumeMount{Name:kube-api-access-f89lv,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:&Lifecycle{PostStart:&LifecycleHandler{Exec:&ExecAction{Command:[nsenter --target=1 --mount -- /bin/sh -c #!/bin/bash
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o errexit
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o pipefail
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o nounset
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # When running in AWS ENI mode, it's likely that 'aws-node' has
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # had a chance to install SNAT iptables rules. These can result
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # in dropped traffic, so we should attempt to remove them.
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # We do it using a 'postStart' hook since this may need to run
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # for nodes which might have already been init'ed but may still
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # have dangling rules. This is safe because there are no
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # dependencies on anything that is part of the startup script
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # itself, and can be safely run multiple times per node (e.g. in
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # case of a restart).
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: if [[ "$(iptables-save | grep -c AWS-SNAT-CHAIN)" != "0" ]];
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: then
Sep 11 06:58:55 10.20.35.219 kubelet[1583]:     echo 'Deleting iptables rules created by the AWS CNI VPC plugin'
Sep 11 06:58:55 10.20.35.219 kubelet[1583]:     iptables-save | grep -v AWS-SNAT-CHAIN | iptables-restore
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: fi
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: echo 'Done!'
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: ],},HTTPGet:nil,TCPSocket:nil,},PreStop:nil,},TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[SYS_MODULE NET_ADMIN SYS_ADMIN SYS_CHROOT SYS_PTRACE],Drop:[],},Privileged:*false,SELinuxOptions:&SELinuxOptions{User:,Role:,Type:spc_t,Level:s0,},RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,} start failed in pod cilium-node-init-9767n_kube-system(cd02bb16-c6ea-4978-bd3a-ffafb9566c97): PostStartHookError: Exec lifecycle hook ([nsenter --target=1 --mount -- /bin/sh -c #!/bin/bash
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o errexit
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o pipefail
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: set -o nounset
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # When running in AWS ENI mode, it's likely that 'aws-node' has
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # had a chance to install SNAT iptables rules. These can result
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # in dropped traffic, so we should attempt to remove them.
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # We do it using a 'postStart' hook since this may need to run
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # for nodes which might have already been init'ed but may still
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # have dangling rules. This is safe because there are no
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # dependencies on anything that is part of the startup script
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # itself, and can be safely run multiple times per node (e.g. in
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: # case of a restart).
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: if [[ "$(iptables-save | grep -c AWS-SNAT-CHAIN)" != "0" ]];
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: then
Sep 11 06:58:55 10.20.35.219 kubelet[1583]:     echo 'Deleting iptables rules created by the AWS CNI VPC plugin'
Sep 11 06:58:55 10.20.35.219 kubelet[1583]:     iptables-save | grep -v AWS-SNAT-CHAIN | iptables-restore
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: fi
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: echo 'Done!'
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: ]) for Container "node-init" in Pod "cilium-node-init-9767n_kube-system(cd02bb16-c6ea-4978-bd3a-ffafb9566c97)" failed - error: rpc error: code = Unknown desc = failed to exec in container: failed to create exec "76a6dcaed6a7de6294437a02e20d6b740d46eadc92d12ce2275a833753051717": cannot exec in a stopped state: unknown, message: ""
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: E0911 06:58:55.117106    1583 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"node-init\" with PostStartHookError: \"Exec lifecycle hook ([nsenter --target=1 --mount -- /bin/sh -c #!/bin/bash\\n\\nset -o errexit\\nset -o pipefail\\nset -o nounset\\n\\n# When running in AWS ENI mode, it's likely that 'aws-node' has\\n# had a chance to install SNAT iptables rules. These can result\\n# in dropped traffic, so we should attempt to remove them.\\n# We do it using a 'postStart' hook since this may need to run\\n# for nodes which might have already been init'ed but may still\\n# have dangling rules. This is safe because there are no\\n# dependencies on anything that is part of the startup script\\n# itself, and can be safely run multiple times per node (e.g. in\\n# case of a restart).\\nif [[ \\\"$(iptables-save | grep -c AWS-SNAT-CHAIN)\\\" != \\\"0\\\" ]];\\nthen\\n    echo 'Deleting iptables rules created by the AWS CNI VPC plugin'\\n    iptables-save | grep -v AWS-SNAT-CHAIN | iptables-restore\\nfi\\necho 'Done!'\\n]) for Container \\\"node-init\\\" in Pod \\\"cilium-node-init-9767n_kube-system(cd02bb16-c6ea-4978-bd3a-ffafb9566c97)\\\" failed - error: rpc error: code = Unknown desc = failed to exec in container: failed to create exec \\\"76a6dcaed6a7de6294437a02e20d6b740d46eadc92d12ce2275a833753051717\\\": cannot exec in a stopped state: unknown, message: \\\"\\\"\"" pod="kube-system/cilium-node-init-9767n" podUID=cd02bb16-c6ea-4978-bd3a-ffafb9566c97
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: I0911 06:58:55.293941    1583 scope.go:110] "RemoveContainer" containerID="d68c37d3e92f32cbe10dc5c45c442475e3c5204f1d61c0a29c1de0f2c775e806"
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: I0911 06:58:55.294399    1583 scope.go:110] "RemoveContainer" containerID="beaa01acf2c9874853e99ff42c7608d0098ef9f8b8daf74f2a42c3777f05858e"
Sep 11 06:58:55 10.20.35.219 kubelet[1583]: E0911 06:58:55.294914    1583 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"node-init\" with CrashLoopBackOff: \"back-off 2m40s restarting failed container=node-init pod=cilium-node-init-9767n_kube-system(cd02bb16-c6ea-4978-bd3a-ffafb9566c97)\"" pod="kube-system/cilium-node-init-9767n" podUID=cd02bb16-c6ea-4978-bd3a-ffafb9566c97

The main problem is that the node-init Pod from Cilium can’t start since it doesn’t have a normal bash on Bottlerocket.

After some googling I found a possible solution for it.

You need only to disable the node-init pod with nodeinit.enabled=false on bottlerocket and everything should work as excepted.

helm install cilium cilium/cilium  \
--version 1.9.5 \
--namespace kube-system \
--set eni=true \
--set ipam.mode=eni \
--set egressMasqueradeInterfaces=eth0 \
--set tunnel=disabled \
—set nodeinit.enabled=false

https://github.com/bottlerocket-os/bottlerocket/issues/1405#issuecomment-804196007

I don’t have time to test it fully in cluster with the networking change.

@triceras Would be nice to hear your feedback if it works for you.

I hope this helps some people. :neckbeard:

Bottlerocket and amazon Linux AMIs come with SSM preinstalled.