cilium: cilium-node-init CrashLoopBackOff when running on Bottlerocket OS

Bug report

General Information

  • Cilium version
Client: 1.9.5 079bdaf 2021-03-10T13:12:19-08:00 go version go1.15.8 linux/amd64
Daemon: 1.9.5 079bdaf 2021-03-10T13:12:19-08:00 go version go1.15.8 linux/amd64
  • EKS Cluster version
v1.18.9
  • Kernel version
Bottlerocket OS
Linux ip-10-95-107-127.ap-southeast-2.compute.internal 5.4.95 #1 SMP Wed Mar 17 19:08:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

AMI:  bottlerocket-aws-k8s-1.18-x86_64-v1.0.7-099d3398
  • Orchestration system version in use
kubectl version

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
  • Link to relevant artifacts (policies, deployments scripts, …)
  • Generate and upload a system zip:
cilium-node-init logs

│nsenter: failed to execute bash: No such file or directory                                                                                                                                                               
!!! startup-script failed! exit code '127'                                                                                                                                                                                 
stream closed

How to reproduce the issue

  1. Deployed Cilium 1.9.5 with the following configuration
cni.chainingMode=aws-cni
masquerade=false
tunnel=disabled
nodeinit.enabled=true
hubble.relay.enabled=true
hubble.listenAddress=:4244
hubble.ui.enabled=true
  1. Added a worker node ( Autoscaling group / Launch Template with Bottlerocket AMI )
  2. cilium-operator pod starts successfully
  3. cilium pod start successfully
  4. cilium-node-init fails with
nsenter: failed to execute bash: No such file or directory                                                                                                                                                               
!!! startup-script failed! exit code '127'                                                                                                                                                                                 
stream closed
Screen Shot 2021-03-19 at 12 16 11 pm

https://github.com/bottlerocket-os/bottlerocket/issues/1405

The working node is an Amazon Linux 2 AMI

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 20 (12 by maintainers)

Commits related to this issue

Most upvoted comments

Just in case someone else needs it, I can confirm that this snippet works:

helm install cilium cilium/cilium --version 1.10.5 \                                                                                         
  --namespace kube-system \                                                                                                                  
  --set eni.enabled=true \                                                                                                                   
  --set ipam.mode=eni \                                                                                                                      
  --set egressMasqueradeInterfaces=eth0 \                                                                                                    
  --set tunnel=disabled \                                                                                                                    
  --set nodeinit.enabled=false

Plus if you want to replace kube-proxy:

helm install cilium cilium/cilium --version 1.10.5 \                                                                                         
  --namespace kube-system \                                                                                                                  
  --set eni.enabled=true \                                                                                                                   
  --set ipam.mode=eni \                                                                                                                      
  --set egressMasqueradeInterfaces=eth0 \                                                                                                    
  --set tunnel=disabled \                                                                                                                    
  --set nodeinit.enabled=false \                                                                                                             
  --set loadBalancer.algorithm=maglev \                                                                                                      
  --set hubble.enabled=true \                                                                                                                
  --set hubble.relay.enabled=true \                                                                                                          
  --set hubble.ui.enabled=true \                                                                                                             
  --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}" \                                                           
  --set kubeProxyReplacement="strict" \                                                                                                      
  --set k8sServiceHost=$API_SERVER_IP \                                                                                                      
  --set k8sServicePort=443                     

Both without aws cni chaining and passing in cilium connectivity test.

Safe to say that this issue can be closed.

The node-init logic that cleaned iptables was moved to agent DS in https://github.com/cilium/cilium/pull/24789, which was part of 1.13.2 release. Closing the issue since node-init doesn’t have any EKS-specific code anymore and can be safely disabled on EKS.

cilium-cli has been set to not enable nodeinit on EKS with version > 1.13.2 in this PR: https://github.com/cilium/cilium-cli/pull/1428 and will be part of next release. Until then, you can disable nodeinit safely via helm flags.