kubernetes: Pods not starting on Kubernetes 1.2-beta.1 with CNI
We have been running a Kube cluster on 1.2-alpha.7 for quite a while with no problems. Now trying to upgrade to Kube 1.2-beta.1 the pods end up in a restart loop and never come online. The upgrade is done via a complete reinstall of OS and Kubernetes and the issue has been recreated several times by upgrading and downgrading between alpha.7 and beta-1
OS: Centos 7.2 Docker: 1.9.1 Kube version: 1.2-beta.1 CNI Provider: calico cni 1.1.0 / calicoctl: 0.17.0
kubectl --namespace=kube-system get pod -o wide
NAME READY STATUS RESTARTS AGE NODE
kube-dns-v10-0nh1i 3/4 Running 68 20m srv07
kube-dns-v10-vfj6d 3/4 Running 60 20m srv05
kube-registry-v0-m0or6 1/1 Running 44 20m srv05
kube-ui-v5-gxs27 1/1 Running 197 20m srv04
kubectl get events for one pod
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
───────── ──────── ───── ──── ───────────── ──────── ────── ───────
11m 10m 5 {default-scheduler } Warning FailedScheduling no nodes available to schedule pods
10m 10m 1 {default-scheduler } Normal Scheduled Successfully assigned kube-dns-v10-0nh1i to srv07
10m 10m 1 {kubelet srv07} spec.containers{etcd} Normal Pulling pulling image "gcr.io/google_containers/etcd:2.0.9"
10m 10m 1 {kubelet srv07} spec.containers{etcd} Normal Pulled Successfully pulled image "gcr.io/google_containers/etcd:2.0.9"
10m 10m 1 {kubelet srv07} spec.containers{etcd} Normal Created Created container with docker id 3efd24477ee2
10m 10m 1 {kubelet srv07} spec.containers{etcd} Normal Started Started container with docker id 3efd24477ee2
10m 10m 1 {kubelet srv07} spec.containers{kube2sky} Normal Pulling pulling image "gcr.io/google_containers/kube2sky:1.12"
10m 10m 1 {kubelet srv07} spec.containers{kube2sky} Normal Pulled Successfully pulled image "gcr.io/google_containers/kube2sky:1.12"
10m 10m 1 {kubelet srv07} spec.containers{kube2sky} Normal Created Created container with docker id f6ce0663c34b
10m 10m 1 {kubelet srv07} spec.containers{kube2sky} Normal Started Started container with docker id f6ce0663c34b
10m 10m 1 {kubelet srv07} spec.containers{skydns} Normal Pulling pulling image "gcr.io/google_containers/skydns:2015-10-13-8c72f8c"
9m 9m 1 {kubelet srv07} spec.containers{skydns} Normal Pulled Successfully pulled image "gcr.io/google_containers/skydns:2015-10-13-8c72f8c"
9m 9m 1 {kubelet srv07} spec.containers{skydns} Normal Created Created container with docker id bf3ecc042ffd
9m 9m 1 {kubelet srv07} spec.containers{skydns} Normal Started Started container with docker id bf3ecc042ffd
9m 9m 1 {kubelet srv07} spec.containers{healthz} Normal Pulling pulling image "gcr.io/google_containers/exechealthz:1.0"
9m 9m 1 {kubelet srv07} spec.containers{healthz} Normal Pulled Successfully pulled image "gcr.io/google_containers/exechealthz:1.0"
9m 9m 1 {kubelet srv07} spec.containers{healthz} Normal Created Created container with docker id 28a0783cf48f
8m 8m 1 {kubelet srv07} spec.containers{healthz} Normal Started Started container with docker id 28a0783cf48f
8m 8m 1 {kubelet srv07} spec.containers{skydns} Normal Killing Killing container with docker id bf3ecc042ffd: Need to kill pod.
8m 8m 1 {kubelet srv07} spec.containers{kube2sky} Normal Killing Killing container with docker id f6ce0663c34b: Need to kill pod.
8m 8m 1 {kubelet srv07} spec.containers{healthz} Normal Killing Killing container with docker id 28a0783cf48f: Need to kill pod.
8m 8m 1 {kubelet srv07} spec.containers{etcd} Normal Killing Killing container with docker id 3efd24477ee2: Need to kill pod.
8m 8m 1 {kubelet srv07} spec.containers{etcd} Normal Created Created container with docker id f37dd5c64348
8m 8m 1 {kubelet srv07} spec.containers{etcd} Normal Started Started container with docker id f37dd5c64348
8m 8m 1 {kubelet srv07} spec.containers{kube2sky} Normal Created Created container with docker id bf2c1a408754
8m 8m 1 {kubelet srv07} spec.containers{kube2sky} Normal Started Started container with docker id bf2c1a408754
8m 8m 1 {kubelet srv07} spec.containers{skydns} Normal Created Created container with docker id 0668fa66b8f7
8m 8m 1 {kubelet srv07} spec.containers{skydns} Normal Started Started container with docker id 0668fa66b8f7
8m 8m 1 {kubelet srv07} spec.containers{healthz} Normal Created Created container with docker id 739e3f9c828d
7m 7m 1 {kubelet srv07} spec.containers{healthz} Normal Started Started container with docker id 739e3f9c828d
7m 7m 1 {kubelet srv07} spec.containers{etcd} Normal Killing Killing container with docker id f37dd5c64348: Need to kill pod.
7m 7m 1 {kubelet srv07} spec.containers{healthz} Normal Killing Killing container with docker id 739e3f9c828d: Need to kill pod.
7m 7m 1 {kubelet srv07} spec.containers{kube2sky} Normal Killing Killing container with docker id bf2c1a408754: Need to kill pod.
7m 7m 1 {kubelet srv07} spec.containers{skydns} Normal Killing Killing container with docker id 0668fa66b8f7: Need to kill pod.
7m 7m 1 {kubelet srv07} spec.containers{kube2sky} Normal Created Created container with docker id b208af539d35
7m 7m 1 {kubelet srv07} spec.containers{kube2sky} Normal Started Started container with docker id b208af539d35
6m 6m 1 {kubelet srv07} spec.containers{kube2sky} Normal Killing Killing container with docker id b208af539d35: Need to kill pod.
journalctl -xn 200 -u kubelet (kubelet running with -v=5)
I can’t see anything weird in the logs for the API Server, Scheduler or Controller Manager
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 60 (40 by maintainers)
Commits related to this issue
- Fix for issue #22932 infinite pod restarts This fixes an issue when using CNI where the hash of a Container object will differ between creation and change checks due to the docker image exporting por... — committed to tobad357/kubernetes by tobad357 8 years ago
- Merge pull request #23050 from tobad357/cni-pod-infinite-restart Fix for issue #22932 infinite pod restarts with CNI — committed to kubernetes/kubernetes by eparis 8 years ago
- Fix for issue #22932 infinite pod restarts This fixes an issue when using CNI where the hash of a Container object will differ between creation and change checks due to the docker image exporting por... — committed to eparis/kubernetes by tobad357 8 years ago
- Clean up fix for issue #22932 infinite pod restarts — committed to dcbw/kubernetes by dcbw 8 years ago
- Clean up fix for issue #22932 infinite pod restarts — committed to dcbw/kubernetes by dcbw 8 years ago
- Fix for issue #22932 infinite pod restarts This fixes an issue when using CNI where the hash of a Container object will differ between creation and change checks due to the docker image exporting por... — committed to shyamjvs/kubernetes by tobad357 8 years ago
- Fix for issue #22932 infinite pod restarts This fixes an issue when using CNI where the hash of a Container object will differ between creation and change checks due to the docker image exporting por... — committed to shouhong/kubernetes by tobad357 8 years ago
Also, using the CNI binaries from the released tarball with a workaround #23039 and config from @bprashanth (https://github.com/kubernetes/kubernetes/issues/22932#issuecomment-197124674) I get this:
One thing that makes this kind of thing very hard to debug (at least for me, not hugely experienced working with k8s) is that after a short while k8s decides things are bad, kills a bunch of things and restarts them to try again. I understand why it does that, but is there a way to stop it while I get a chance to look at what happened?
I’ve found the issue for the restarting pods. It is when using CNI and the docker image exports ports as well. This is due to a difference between the creation of the Container object when creating the pod and when checking if it has changed. You should be able to replicate this by running a docker image with exported ports under kubernetes + cni. I should have pull request ready soon
If it really is a regression from beta0 to beta1, the diff here might help - https://github.com/kubernetes/kubernetes/compare/v1.2.0-beta.0...v1.2.0-beta.1
The one change that jumps out this this one from @dcbw https://github.com/kubernetes/kubernetes/commit/bc62096ad5ea8ce15156b39cfd2333bc6f589905