kubernetes: Pods unable to get IP addresses
In my cluster, I meet a race condition issue which I think happened in kube-scheduler. Here is my phenomena:
- My minion node has CIDR for docker at /27 which allows 32 docker IPs.
- The Pod has been scheduled to one minion node but could not start as expected with the following docker error message:
13m 13m 1 {kubelet kubernetes-minion-100-5034} FailedSync Error syncing pod, skipping: API error (500): Cannot start container 26a68f65a401512f52c75cec3d61ec3a51693dd1774c0ffeb0bc0dc99d09b44f: no available ip addresses on network
12m 12m 1 {kubelet kubernetes-minion-100-5034} implicitly required container POD Failed Failed to start with docker id c6d5e43d1e59 with error: API error (500): Cannot start container c6d5e43d1e59f399c80e45b4fc74acb7f1031e936f1e233a13b9009bebc7772a: no available ip addresses on network
- I logon to my minion and find that there are already 30 containers running. So the new scheduled Pod could not get available IP address.
I checked kube-scheduler
source code and find that it does the following things:
- Get node information
- Find nodes that fit the Pod requirements
- Select one node from prioritized node list I would assume there is race condition in these steps, that the problem node was not report its status at the right time. And scheduler schedule the new Pod to it.
Would it be make sense to have a double check before return g.selectHost(priorityList)
?
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 28 (19 by maintainers)
I’m running into this on google container engine using a 30 node cluster with hundreds of pods.
Pods are stuck in
ContainerCreating
status and this is the event log fromkubectl get events
:4s 1m 43 some-api-3484797171-tz7rb Pod Warning FailedSync {kubelet gke-some-cluster-default-pool-c31be7a2-zpio} Error syncing pod, skipping: failed to "SetupNetwork" for "some-api-3484797171-tz7rb_default" with SetupNetworkError: "Failed to setup network for pod \"some-api-3484797171-tz7rb_default(f7656535-86c9-11e6-924e-42010a800083)\" using network plugins \"kubenet\": Error adding container to network: no IP addresses available in network: kubenet; Skipping pod"
Interestingly it happens to the last pod for any deployments when I scale them. I’ve also tried deleting the pods that are stuck in
ContainerCreating
status but another takes it’s place.@saromanov when I upgraded to 1.4.0 it stopped happening.