kubernetes: Pods not starting on Kubernetes 1.2-beta.1 with CNI

We have been running a Kube cluster on 1.2-alpha.7 for quite a while with no problems. Now trying to upgrade to Kube 1.2-beta.1 the pods end up in a restart loop and never come online. The upgrade is done via a complete reinstall of OS and Kubernetes and the issue has been recreated several times by upgrading and downgrading between alpha.7 and beta-1

OS: Centos 7.2 Docker: 1.9.1 Kube version: 1.2-beta.1 CNI Provider: calico cni 1.1.0 / calicoctl: 0.17.0

kubectl --namespace=kube-system get pod -o wide

NAME                     READY     STATUS    RESTARTS   AGE       NODE
kube-dns-v10-0nh1i       3/4       Running   68         20m       srv07
kube-dns-v10-vfj6d       3/4       Running   60         20m       srv05
kube-registry-v0-m0or6   1/1       Running   44         20m       srv05
kube-ui-v5-gxs27         1/1       Running   197        20m       srv04

kubectl get events for one pod

 FirstSeen  LastSeen    Count   From            SubobjectPath           Type        Reason          Message
  ─────────   ────────    ───── ────            ─────────────         ────────    ──────          ───────
  11m       10m     5   {default-scheduler }                    Warning     FailedScheduling    no nodes available to schedule pods
  10m       10m     1   {default-scheduler }                    Normal      Scheduled       Successfully assigned kube-dns-v10-0nh1i to srv07
  10m       10m     1   {kubelet srv07}     spec.containers{etcd}       Normal      Pulling         pulling image "gcr.io/google_containers/etcd:2.0.9"
  10m       10m     1   {kubelet srv07}     spec.containers{etcd}       Normal      Pulled          Successfully pulled image "gcr.io/google_containers/etcd:2.0.9"
  10m       10m     1   {kubelet srv07}     spec.containers{etcd}       Normal      Created         Created container with docker id 3efd24477ee2
  10m       10m     1   {kubelet srv07}     spec.containers{etcd}       Normal      Started         Started container with docker id 3efd24477ee2
  10m       10m     1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Pulling         pulling image "gcr.io/google_containers/kube2sky:1.12"
  10m       10m     1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Pulled          Successfully pulled image "gcr.io/google_containers/kube2sky:1.12"
  10m       10m     1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Created         Created container with docker id f6ce0663c34b
  10m       10m     1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Started         Started container with docker id f6ce0663c34b
  10m       10m     1   {kubelet srv07}     spec.containers{skydns}     Normal      Pulling         pulling image "gcr.io/google_containers/skydns:2015-10-13-8c72f8c"
  9m        9m      1   {kubelet srv07}     spec.containers{skydns}     Normal      Pulled          Successfully pulled image "gcr.io/google_containers/skydns:2015-10-13-8c72f8c"
  9m        9m      1   {kubelet srv07}     spec.containers{skydns}     Normal      Created         Created container with docker id bf3ecc042ffd
  9m        9m      1   {kubelet srv07}     spec.containers{skydns}     Normal      Started         Started container with docker id bf3ecc042ffd
  9m        9m      1   {kubelet srv07}     spec.containers{healthz}    Normal      Pulling         pulling image "gcr.io/google_containers/exechealthz:1.0"
  9m        9m      1   {kubelet srv07}     spec.containers{healthz}    Normal      Pulled          Successfully pulled image "gcr.io/google_containers/exechealthz:1.0"
  9m        9m      1   {kubelet srv07}     spec.containers{healthz}    Normal      Created         Created container with docker id 28a0783cf48f
  8m        8m      1   {kubelet srv07}     spec.containers{healthz}    Normal      Started         Started container with docker id 28a0783cf48f
  8m        8m      1   {kubelet srv07}     spec.containers{skydns}     Normal      Killing         Killing container with docker id bf3ecc042ffd: Need to kill pod.
  8m        8m      1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Killing         Killing container with docker id f6ce0663c34b: Need to kill pod.
  8m        8m      1   {kubelet srv07}     spec.containers{healthz}    Normal      Killing         Killing container with docker id 28a0783cf48f: Need to kill pod.
  8m        8m      1   {kubelet srv07}     spec.containers{etcd}       Normal      Killing         Killing container with docker id 3efd24477ee2: Need to kill pod.
  8m        8m      1   {kubelet srv07}     spec.containers{etcd}       Normal      Created         Created container with docker id f37dd5c64348
  8m        8m      1   {kubelet srv07}     spec.containers{etcd}       Normal      Started         Started container with docker id f37dd5c64348
  8m        8m      1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Created         Created container with docker id bf2c1a408754
  8m        8m      1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Started         Started container with docker id bf2c1a408754
  8m        8m      1   {kubelet srv07}     spec.containers{skydns}     Normal      Created         Created container with docker id 0668fa66b8f7
  8m        8m      1   {kubelet srv07}     spec.containers{skydns}     Normal      Started         Started container with docker id 0668fa66b8f7
  8m        8m      1   {kubelet srv07}     spec.containers{healthz}    Normal      Created         Created container with docker id 739e3f9c828d
  7m        7m      1   {kubelet srv07}     spec.containers{healthz}    Normal      Started         Started container with docker id 739e3f9c828d
  7m        7m      1   {kubelet srv07}     spec.containers{etcd}       Normal      Killing         Killing container with docker id f37dd5c64348: Need to kill pod.
  7m        7m      1   {kubelet srv07}     spec.containers{healthz}    Normal      Killing         Killing container with docker id 739e3f9c828d: Need to kill pod.
  7m        7m      1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Killing         Killing container with docker id bf2c1a408754: Need to kill pod.
  7m        7m      1   {kubelet srv07}     spec.containers{skydns}     Normal      Killing         Killing container with docker id 0668fa66b8f7: Need to kill pod.
  7m        7m      1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Created         Created container with docker id b208af539d35
  7m        7m      1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Started         Started container with docker id b208af539d35
  6m        6m      1   {kubelet srv07}     spec.containers{kube2sky}   Normal      Killing         Killing container with docker id b208af539d35: Need to kill pod.

journalctl -xn 200 -u kubelet (kubelet running with -v=5)

I can’t see anything weird in the logs for the API Server, Scheduler or Controller Manager

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 60 (40 by maintainers)

Commits related to this issue

Fix for issue #22932 infinite pod restarts This fixes an issue when using CNI where the hash of a Container object will differ between creation and change checks due to the docker image exporting por... — committed to tobad357/kubernetes by tobad357 8 years ago
Merge pull request #23050 from tobad357/cni-pod-infinite-restart Fix for issue #22932 infinite pod restarts with CNI — committed to kubernetes/kubernetes by eparis 8 years ago
Fix for issue #22932 infinite pod restarts This fixes an issue when using CNI where the hash of a Container object will differ between creation and change checks due to the docker image exporting por... — committed to eparis/kubernetes by tobad357 8 years ago
Clean up fix for issue #22932 infinite pod restarts — committed to dcbw/kubernetes by dcbw 8 years ago
Clean up fix for issue #22932 infinite pod restarts — committed to dcbw/kubernetes by dcbw 8 years ago
Fix for issue #22932 infinite pod restarts This fixes an issue when using CNI where the hash of a Container object will differ between creation and change checks due to the docker image exporting por... — committed to shyamjvs/kubernetes by tobad357 8 years ago
Fix for issue #22932 infinite pod restarts This fixes an issue when using CNI where the hash of a Container object will differ between creation and change checks due to the docker image exporting por... — committed to shouhong/kubernetes by tobad357 8 years ago

Most upvoted comments

Also, using the CNI binaries from the released tarball with a workaround #23039 and config from @bprashanth (https://github.com/kubernetes/kubernetes/issues/22932#issuecomment-197124674) I get this:

2016-03-16 09:24:57 +0000 UTC   2016-03-16 09:26:36 +0000 UTC   41        redis-master-9wkyg   Pod                 Warning   FailedSync   {kubelet kubedev}   Error syncing pod, skipping: failed to "SetupNetwork" for "redis-master-9wkyg_default" with SetupNetworkError: "Failed to setup network for pod \"redis-master-9wkyg_default(e10c3705-eb58-11e5-a8ce-0242ac110004)\" using network plugins \"cni\": ARGS: invalid key \"K8S_POD_NAMESPACE\"; Skipping pod"

errordeveloper on Mar 16, 2016

One thing that makes this kind of thing very hard to debug (at least for me, not hugely experienced working with k8s) is that after a short while k8s decides things are bad, kills a bunch of things and restarts them to try again. I understand why it does that, but is there a way to stop it while I get a chance to look at what happened?

bboreham on Mar 16, 2016

I’ve found the issue for the restarting pods. It is when using CNI and the docker image exports ports as well. This is due to a difference between the creation of the Container object when creating the pod and when checking if it has changed. You should be able to replicate this by running a docker image with exported ports under kubernetes + cni. I should have pull request ready soon

tobad357 on Mar 16, 2016

If it really is a regression from beta0 to beta1, the diff here might help - https://github.com/kubernetes/kubernetes/compare/v1.2.0-beta.0...v1.2.0-beta.1

The one change that jumps out this this one from @dcbw https://github.com/kubernetes/kubernetes/commit/bc62096ad5ea8ce15156b39cfd2333bc6f589905

tomdee on Mar 15, 2016