cni-ipvlan-vpc-k8s: Inconsistency between ENI allocated IPs and OS configuration
We are seeing an issue that seems to happens regularly: some pods have no network connectivity
After looking into the configuration it turns out that when this happens we are in the following situation:
- pod sandbox configured properly (veth and ipvlan interfaces, as well as proper routing configurations)
- IP of the pod not associated with the ENI so traffic is dropped by the VPC
After looking into logs we found the following:
- cloudtrail shows a call to unassociate the IP address from the ENI (which seems to indicate that the CNI plugin was called with DELETE, but the routes and iptables rules are still there
- the sandbox itself is not deleted. We found some errors in the kubelet logs, not sure it this is related:
failed to remove pod init container "consul-template": failed to get container status "371295090acf33795fe5badb07063021cace4fcff719cd13effc6ff2b5136f70": rpc error: code = Unknown desc = Error: No such container: 371295090acf33795fe5badb07063021cace4fcff719cd13effc6ff2b5136f70; Skipping pod "alerting-metric-evaluator-anomaly-0_datadog(4c15f7d2-5783-11e8-903a-02fc6d7aa9b8)"
- kubelet tries to restart containers in the same sandbox (which fails because the pods have no network connectivity, which is required by the init container)
Any idea what could trigger this situation? Our current setup uses docker, kubelet 1.10 and the latest version of the CNI plugin.
I think SkipDeallocation could probably help but I’d like to understand exactly what is happening.
I wonder if allowing for more verbose logs could help in this kind of situation (for instance log ADD/DELETE calls with parameters)
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 16 (13 by maintainers)
Commits related to this issue
- sets skipDeallocation=true to avoid https://github.com/lyft/cni-ipvlan-vpc-k8s/issues/41 — committed to locationlabs/kops by chris-h-phillips 6 years ago
- sets skipDeallocation=true to avoid https://github.com/lyft/cni-ipvlan-vpc-k8s/issues/41 — committed to locationlabs/kops by chris-h-phillips 6 years ago
- sets skipDeallocation=true to avoid https://github.com/lyft/cni-ipvlan-vpc-k8s/issues/41 — committed to locationlabs/kops by chris-h-phillips 6 years ago
Initial testing looks good, we are going to deploy to a larger cluster
We’re shipping a rc later this week that I’m hopeful will address this issue that you’ve been hitting – this is part of a refactor in conjunction with us moving to k8s 1.10. Will keep you updated.