kubernetes: Pods unable to get IP addresses

In my cluster, I meet a race condition issue which I think happened in kube-scheduler. Here is my phenomena:

  • My minion node has CIDR for docker at /27 which allows 32 docker IPs.
  • The Pod has been scheduled to one minion node but could not start as expected with the following docker error message:
  13m   13m     1       {kubelet kubernetes-minion-100-5034}               FailedSync      Error syncing pod, skipping: API error (500): Cannot start container 26a68f65a401512f52c75cec3d61ec3a51693dd1774c0ffeb0bc0dc99d09b44f: no available ip addresses on network

  12m   12m     1       {kubelet kubernetes-minion-100-5034}       implicitly required container POD       Failed  Failed to start with docker id c6d5e43d1e59 with error: API error (500): Cannot start container c6d5e43d1e59f399c80e45b4fc74acb7f1031e936f1e233a13b9009bebc7772a: no available ip addresses on network
  • I logon to my minion and find that there are already 30 containers running. So the new scheduled Pod could not get available IP address.

I checked kube-scheduler source code and find that it does the following things:

  1. Get node information
  2. Find nodes that fit the Pod requirements
  3. Select one node from prioritized node list I would assume there is race condition in these steps, that the problem node was not report its status at the right time. And scheduler schedule the new Pod to it.

Would it be make sense to have a double check before return g.selectHost(priorityList)?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 28 (19 by maintainers)

Commits related to this issue

Most upvoted comments

I’m running into this on google container engine using a 30 node cluster with hundreds of pods.

Pods are stuck in ContainerCreating status and this is the event log from kubectl get events:

4s 1m 43 some-api-3484797171-tz7rb Pod Warning FailedSync {kubelet gke-some-cluster-default-pool-c31be7a2-zpio} Error syncing pod, skipping: failed to "SetupNetwork" for "some-api-3484797171-tz7rb_default" with SetupNetworkError: "Failed to setup network for pod \"some-api-3484797171-tz7rb_default(f7656535-86c9-11e6-924e-42010a800083)\" using network plugins \"kubenet\": Error adding container to network: no IP addresses available in network: kubenet; Skipping pod"

Interestingly it happens to the last pod for any deployments when I scale them. I’ve also tried deleting the pods that are stuck in ContainerCreating status but another takes it’s place.

$ kubectl get deployments
NAME                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
some-api                80        80        80           79          10d 
another-api             20        20        20           19          10d
...

@saromanov when I upgraded to 1.4.0 it stopped happening.