kubernetes: SkyDNS does not work when using it with Kubernetes Docker multinode setup

I’m running kubernetes docker containers on CentOS Linux release 7.2.1511, 3.10.0-327.10.1.el7.x86_64. I’m using the following script to run the kube-master link and the following script to run the SkyDNS link I’m updating the the places with {{ pillar }} to the actual values as suggested by the tutorial: http://kubernetes.io/docs/getting-started-guides/docker-multinode/deployDNS/

I’m using kubernetes 1.2.0-alpha.7

Client Version: version.Info{Major:“1”, Minor:“2+”, GitVersion:“v1.2.0-beta.0”, GitCommit:“50f7568d7f9b001c90ed75e79d41478afcd64a34”, GitTreeState:“clean”} Server Version: version.Info{Major:“1”, Minor:“2+”, GitVersion:“v1.2.0-alpha.7”, GitCommit:“c0fd002fbb25d6a6cd8427d28b8ec78379c354a0”, GitTreeState:“clean”}

[local@kube-master-1458129646 ~]$ kubectl cluster-info Kubernetes master is running at http://10.57.50.181:8080 KubeDNS is running at http://10.57.50.181:8080/api/v1/proxy/namespaces/kube-system/services/kube-dns When accessing the following URL: http://10.57.50.181:8080/api/v1/proxy/namespaces/kube-system/services/kube-dns I get the following:

{ "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "no endpoints available for service \"kube-dns\"", "reason": "ServiceUnavailable", "code": 503 }

[local@kube-master-1458129646 ~]$ kubectl describe svc Name: kubernetes Namespace: default Labels: component=apiserver,provider=kubernetes Selector: <none> Type: ClusterIP IP: 10.0.0.1 Port: https 443/TCP Endpoints: 10.57.50.181:6443 Session Affinity: None No events.

[local@kube-master-1458129646 ~]$ kubectl describe svc --namespace=kube-system Name: kube-dns Namespace: kube-system Labels: k8s-app=kube-dns,kubernetes.io/cluster-service=true,kubernetes.io/name=KubeDNS Selector: k8s-app=kube-dns Type: ClusterIP IP: 10.0.0.10 Port: dns 53/UDP Endpoints: Port: dns-tcp 53/TCP Endpoints: Session Affinity: None No events.

Here are the logs:

kube2sky logs

I0317 10:58:02.475909 1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001 I0317 10:58:03.478586 1 kube2sky.go:529] Using https://10.0.0.1:443 for kubernetes master I0317 10:58:03.478612 1 kube2sky.go:530] Using kubernetes API <nil> I0317 10:58:03.479278 1 kube2sky.go:598] Waiting for service: default/kubernetes I0317 10:58:04.480549 1 kube2sky.go:604] Ignoring error while waiting for service default/kubernetes: Get https://10.0.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.0.0.1:443: getsockopt: no route to host. Sleeping 1s before retrying. I0317 10:58:06.484404 1 kube2sky.go:604] Ignoring error while waiting for service default/kubernetes: Get https://10.0.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.0.0.1:443: getsockopt: no route to host. Sleeping 1s before retrying. I0317 10:58:08.488550 1 kube2sky.go:604] Ignoring error while waiting for service default/kubernetes: Get https://10.0.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.0.0.1:443: getsockopt: no route to host. Sleeping 1s before retrying.

skydns logs

2016/03/17 09:39:55 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [3] 2016/03/17 09:39:55 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0] 2016/03/17 09:39:55 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0] 2016/03/17 09:40:17 skydns: failure to forward request "read udp 10.56.190.1:53: no route to host" 2016/03/17 09:40:20 skydns: failure to forward request "read udp 10.56.190.1:53: i/o timeout" 2016/03/17 09:40:27 skydns: failure to forward request "read udp 10.56.190.1:53: i/o timeout" 2016/03/17 09:40:31 skydns: failure to forward request "read udp 10.56.190.1:53: i/o timeout" 2016/03/17 09:40:38 skydns: failure to forward request "read udp 10.56.190.1:53: i/o timeout" 2016/03/17 09:40:42 skydns: failure to forward request "read udp 10.56.190.1:53: i/o timeout" 2016/03/17 09:40:49 skydns: failure to forward request "read udp 10.56.190.1:53: i/o timeout"

etcd logs 2016-03-17 09:39:52.074261 I | etcdmain: etcd Version: 2.2.1 2016-03-17 09:39:52.074324 I | etcdmain: Git SHA: 75f8282 2016-03-17 09:39:52.074335 I | etcdmain: Go Version: go1.5.1 2016-03-17 09:39:52.074343 I | etcdmain: Go OS/Arch: linux/amd64 2016-03-17 09:39:52.074384 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8 2016-03-17 09:39:52.074974 I | etcdmain: listening for peers on http://localhost:2380 2016-03-17 09:39:52.075239 I | etcdmain: listening for peers on http://localhost:7001 2016-03-17 09:39:52.075304 I | etcdmain: listening for client requests on http://127.0.0.1:2379 2016-03-17 09:39:52.075406 I | etcdmain: listening for client requests on http://127.0.0.1:4001 2016-03-17 09:39:52.075813 I | etcdserver: name = default 2016-03-17 09:39:52.075829 I | etcdserver: data dir = /var/etcd/data 2016-03-17 09:39:52.075837 I | etcdserver: member dir = /var/etcd/data/member 2016-03-17 09:39:52.075844 I | etcdserver: heartbeat = 100ms 2016-03-17 09:39:52.075851 I | etcdserver: election = 1000ms 2016-03-17 09:39:52.075857 I | etcdserver: snapshot count = 10000 2016-03-17 09:39:52.075874 I | etcdserver: advertise client URLs = http://127.0.0.1:2379,http://127.0.0.1:4001 2016-03-17 09:39:52.075887 I | etcdserver: initial advertise peer URLs = http://localhost:2380,http://localhost:7001 2016-03-17 09:39:52.075906 I | etcdserver: initial cluster = default=http://localhost:2380,default=http://localhost:7001 2016-03-17 09:39:52.077919 I | etcdserver: starting member 6a5871dbdd12c17c in cluster f68652439e3f8f2a 2016-03-17 09:39:52.077997 I | raft: 6a5871dbdd12c17c became follower at term 0 2016-03-17 09:39:52.078027 I | raft: newRaft 6a5871dbdd12c17c [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0] 2016-03-17 09:39:52.078036 I | raft: 6a5871dbdd12c17c became follower at term 1 2016-03-17 09:39:52.078279 I | etcdserver: starting server... [version: 2.2.1, cluster version: to_be_decided] 2016-03-17 09:39:52.079411 N | etcdserver: added local member 6a5871dbdd12c17c [http://localhost:2380 http://localhost:7001] to cluster f68652439e3f8f2a 2016-03-17 09:39:52.878477 I | raft: 6a5871dbdd12c17c is starting a new election at term 1 2016-03-17 09:39:52.878553 I | raft: 6a5871dbdd12c17c became candidate at term 2 2016-03-17 09:39:52.878574 I | raft: 6a5871dbdd12c17c received vote from 6a5871dbdd12c17c at term 2 2016-03-17 09:39:52.878600 I | raft: 6a5871dbdd12c17c became leader at term 2 2016-03-17 09:39:52.878666 I | raft: raft.node: 6a5871dbdd12c17c elected leader 6a5871dbdd12c17c at term 2 2016-03-17 09:39:52.879294 I | etcdserver: setting up the initial cluster version to 2.2 2016-03-17 09:39:52.879432 I | etcdserver: published {Name:default ClientURLs:[http://127.0.0.1:2379 http://127.0.0.1:4001]} to cluster f68652439e3f8f2a 2016-03-17 09:39:52.880946 N | etcdserver: set the initial cluster version to 2.2 2016-03-17 11:03:10.579853 I | etcdserver: start to snapshot (applied: 10001, lastsnap: 0) 2016-03-17 11:03:10.582194 I | etcdserver: saved snapshot at index 10001 2016-03-17 11:03:10.582470 I | etcdserver: compacted raft log at 5001

I have exactly followed the tutorial: docker multinode setup, but I cannot make the SkyDNS work

What am I missing? Or it is a bug in the scritps?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 26 (18 by maintainers)

Most upvoted comments

Just to be clear about things I know:

  1. I know that the issue is related to service account tokens not functioning correclty
  2. This is reproducible, turnup and turndown the cluster a few times and then try to use service account tokens.
  3. I know that /var/lib/kubelet has some state in it that is causing this because if I blow it away between runs I don’t have any issues.

Let me dive into my theory.

When people are using the moral equivalent of the turnup and turndown scripts, they create the containers and then destroy them (but don’t clear /var/lib/kubelet where all emptyDir and secrets mounts live). The setup-files.sh script writes to an empty dir which are shared by apiserver (the thing that verifies tokens) and controller-manager (the thing that mints and signs tokens). When kubelet starts a pod from an on-disk file it gets the same pod-id every time, so any emptyDirs that were mounted will have the files in there from before (because we didn’t kill /var/lib/kubelet).

My guess is:

  1. apiserver starts up with certs from previous kube-up. controller-manager waits for the api to become available to create the token in a secret.
  2. setup-files.sh completes
  3. controller-manager sees the api-server starts up and mints/signs the token and creates the secret with the token and cert (which is also broken) but with the now modified ca.crt and keyfile that setup-files.sh wrote.

An easy way to test this theory would be to have setup-files.sh exit early if the ca.crt and keys are already generated. I won’t have time over the next few days to work on this so if someone wants to try it out, I’d gladly help out in anyway I can.

BTW: I’m tracking what I believe is the underlying issue at #23197.