amazon-vpc-cni-k8s: Panic on v1.13.2

What happened: VPC CNI panic

Attach logs

$ kubectl logs aws-node-4pts8 -p
Installed /host/opt/cni/bin/aws-cni
Installed /host/opt/cni/bin/egress-cni
time="2023-06-29T15:12:21Z" level=info msg="Starting IPAM daemon... "
time="2023-06-29T15:12:21Z" level=info msg="Checking for IPAM connectivity... "
time="2023-06-29T15:12:23Z" level=info msg="Copying config file... "
time="2023-06-29T15:12:23Z" level=info msg="Successfully copied CNI plugin binary and config file."
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x560ff5791b21]

goroutine 398 [running]:
github.com/aws/amazon-vpc-cni-k8s/pkg/awsutils.(*EC2InstanceMetadataCache).DescribeAllENIs(0xc0009508c0)
	/go/src/github.com/aws/amazon-vpc-cni-k8s/pkg/awsutils/awsutils.go:1239 +0xd01
github.com/aws/amazon-vpc-cni-k8s/pkg/ipamd.(*IPAMContext).nodeIPPoolReconcile(0xc000950780, {0x560ff6815970, 0xc000134020}, 0xdf8475800)
	/go/src/github.com/aws/amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go:1412 +0x5e9
github.com/aws/amazon-vpc-cni-k8s/pkg/ipamd.(*IPAMContext).StartNodeIPPoolManager(0xc000950780)
	/go/src/github.com/aws/amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go:699 +0x66
created by main._main
	/go/src/github.com/aws/amazon-vpc-cni-k8s/cmd/aws-k8s-agent/main.go:68 +0x42b
time="2023-06-29T15:12:28Z" level=error msg="Failed to wait for IPAM daemon to complete" error="exit status 2"

What you expected to happen: aws-node pods don’t panic ^^

How to reproduce it (as minimally and precisely as possible): unsure

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.17-eks-c12679a", GitCommit:"d5ce2cee85d99653d6f8c278043213db21b1cd72", GitTreeState:"clean", BuildDate:"2023-05-22T20:32:28Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
  • CNI Version: v.1.13.2
  • OS (e.g: cat /etc/os-release):
$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
  • Kernel (e.g. uname -a):
$ uname -a
Linux ip-xx-xx-xx-xx.xxxxx.compute.internal 5.4.242-156.349.amzn2.x86_64 #1 SMP Tue May 23 18:48:04 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 15 (8 by maintainers)

Most upvoted comments

@jose-ledesma Feel free to email k8s-awscni-triage@amazon.com if that is easier. If you take that route, the full node logs would be best (from sudo bash /opt/cni/bin/aws-cni-support.sh). It would also be good to include how you are deploying the CNI, i.e. EKS addon, manifest from GitHub, or Helm chart

The nodes have been already terminated, but I can still grab the logs for specific files from our logs platform (so, let me know if there are any other log useful)