amazon-vpc-cni-k8s: Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container network for pod : NetworkPlugin cni failed to set up pod network: add cmd: failed to assign an IP address to container Error: context deadline exceeded

I’m experiencing frequent issues with the vpc cni failing to assign IPs to pods. In this particular case the instance is an m5a.4xlarge. I have the support files from the instance if needed.

Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.7-eks-c57ff8", GitCommit:"c57ff8e35590932c652433fab07988da79265d5b", GitTreeState:"clean", BuildDate:"2019-06-07T20:43:03Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

I see this in the ipamd logs.

2019-07-30T15:31:53Z [ERROR]	Failed to increase pool size due to not able to allocate ENI AllocENI: error attaching ENI: attachENI: failed to attach ENI: AttachmentLimitExceeded: Interface count 9 exceeds the limit for m5a.4xlarge
	status code: 400, request id: 0f1dbc3b-cde5-4bd8-832a-72e47c59a362
2019-07-30T15:31:53Z [DEBUG]	Successfully increased IP pool
2019-07-30T15:31:53Z [DEBUG]	IP pool stats: total = 58, used = 58, c.maxIPsPerENI = 29
2019-07-30T15:31:53Z [DEBUG]	IP pool stats: total = 58, used = 58, c.maxIPsPerENI = 29
2019-07-30T15:31:53Z [DEBUG]	Its NOT possible to remove extra ENIs because available (0) <= ENI target (2) * addrsPerENI (29):

ipamd-env.out

{"AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG":false,"WARM_ENI_TARGET":2,"WARM_IP_TARGET":0}

metrics.out

# HELP awscni_add_ip_req_count The number of add IP address request
# TYPE awscni_add_ip_req_count counter
awscni_add_ip_req
# HELP awscni_del_ip_req_count The number of delete IP address request
# TYPE awscni_del_ip_req_count counter
awscni_del_ip_req_count{reason="PodDeleted"} 3.981822e+06
# HELP awscni_eni_allocated The number of ENIs allocated
# TYPE awscni_eni_allocated gauge
awscni_eni_allocated 2
# HELP awscni_eni_max The maximum number of ENIs that can be attached to the instance
# TYPE awscni_eni_max gauge
awscni_eni_max 8
# HELP awscni_ip_max The maximum number of IP addresses that can be allocated to the instance
# TYPE awscni_ip_max gauge
awscni_ip_max 232
# HELP awscni_ipamd_action_inprogress The number of ipamd actions in progress
# TYPE awscni_ipamd_action_inprogress gauge
awscni_ipamd_action_inprogress{fn="increaseIPPool"} 0
awscni_ipamd_action_inprogress{fn="nodeIPPoolReconcile"} 0
awscni_ipamd_action_inprogress{fn="nodeInit"} 0
# HELP awscni_ipamd_error_count The number of errors encountered in ipamd
# TYPE awscni_ipamd_error_count counter
awscni_ipamd_error_count{fn="increaseIPPoolAllocENI"} 139666
awscni_ipamd_error_count{fn="increaseIPPoolwaitENIAttachedFailed"} 6
awscni_ipamd_error_count{fn="waitENIAttachedMaxRetryExceeded"} 6
# HELP awscni_total_ip_addresses The total number of IP addresses
# TYPE awscni_total_ip_addresses gauge
awscni_total_ip_addresses 58_count 1.711633e+06
# HELP awscni_assigned_ip_addresses The number of IP addresses assigned to pods
# TYPE awscni_assigned_ip_addresses gauge
awscni_assigned_ip_addresses 58
# HELP awscni_aws_api_error_count The number of times AWS API returns an error
# TYPE awscni_aws_api_error_count counter
awscni_aws_api_error_count{api="AttachNetworkInterface",error="AttachmentLimitExceeded"} 139666
awscni_aws_api_error_count{api="CreateTags",error="InvalidNetworkInterfaceID.NotFound"} 119

It seems that the instance is never going above 58 ips, but still trying to schedule pods. The kubelet max pods is appropriately set to 234. I can provide additional logs upon request.

Stopping and deleting the aws-node container seems to resolve the issue. After performing that the ipamd logs look as expected.

IP pool stats: total = 203, used = 139, c.maxIPsPerENI = 29

I appreciate any guidance.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 10
Comments: 20 (4 by maintainers)

Most upvoted comments

I deleted the cluster and recreated .the problem solved.I have created it using terraform

rjshk013 on May 11, 2021