cilium: CI: 1.9.K8sValidatedPolicyTest - DNS \"redis-master.default.svc.cluster.local\" is not ready after timeout
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-K8s/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:312
DNS entry is not ready after timeout
Expected
<*errors.errorString | 0xc420049cb0>: {
s: "Timeout reached: DNS \"redis-master.default.svc.cluster.local\" is not ready after timeout",
}
to be nil
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-K8s/src/github.com/cilium/cilium/test/k8sT/Policies.go:723
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 27 (27 by maintainers)
Commits related to this issue
- Test: Updating Kube-dns manifest to get more verbose - Updating kube-dns to version 1.14.10 Related to #3878 Signed-off-by: Eloy Coto <eloy.coto@gmail.com> — committed to cilium/cilium by eloycoto 6 years ago
- Test: Updating Kube-dns manifest to get more verbose - Updating kube-dns to version 1.14.10 Related to #3878 Signed-off-by: Eloy Coto <eloy.coto@gmail.com> — committed to cilium/cilium by eloycoto 6 years ago
- Test: Wait for kubedns to be ready after cilium upgrade When Cilium updates DNS pods are not ready and Cilium try to connect to the service, but pod is not ready. Related to #3878 Signed-off-by: El... — committed to eloycoto/cilium by eloycoto 6 years ago
- Test: Wait for kubedns to be ready after cilium upgrade When Cilium updates DNS pods are not ready and Cilium try to connect to the service, but pod is not ready. Related to #3878 Signed-off-by: El... — committed to cilium/cilium by eloycoto 6 years ago
So, I finally saw a test fail on https://github.com/cilium/cilium/pull/4406 (this tested undoing the MTU changes). The errors are a little different, and the branch doesn’t include adde5de7053849bf0475c9047f40b6d20a04316c but that isn’t terribly relevant.
tl;dr we seem to be missing routes to clusterIPs, so kubedns can’t connect to the kubeapiserver. Cilium can, however, so I’m not sure I interpreted this correctly.
f0c5341e_K8sValidatedUpdates_Updating_Cilium_stable_to_master.zip
The most interesting error is from kubedns.log
E0612 19:23:34.370631 143 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to hostNote that kubedns is running on k8s2 but the apiserver is on k8s1. This is what @eloycoto pointed out above. I had hoped it was because of the MTU stuff but it isn’t. It’s also worth noting that kubedns is the only pod with
restartCount>0. kubedns: 5, sidecar:4 and dnsmasq: 4. The cilium pod on k8s2 is cilium-490kg. Neither cilium has the kubedns in the lb map.The cilium-health report is also bad, like what @eloycoto saw:
cilium-health status
Oddly, cilium doesn’t know about kubedns
We seem to be missing the kube-apiserver cluster IP route (clusterIP: 10.96.0.1, backend: 192.168.36.11). I’m not sure why. The podCIDRs for the nodes are
10.10.0.0/24and10.10.1.0/24.k8s2 ip r
k8s1 ip r
Looking through the logs for 172.17 we see
and looking for that message
pkg/workloads/docker.go:380: scopedLog.WithField(logfields.Object, contNetwork).Debug("Skipping network because of gateway mismatch")Oddly, that happens ingetCiliumIPv6and that is used when deciding to ignore workloads in pkg/workloads/docker.go.10.96is mentioned in a similar context:but has no skip/remove log
@eloycoto could this be a conflict between a hardcoded podCIDR CLI option to cilium and a derived value from k8s?
@raybejjani the “0.0.0.0 (0)” entry in
cilium bpf lb listis normal, it is used in the BPF to hold the number of other service backends that there are (I investigated briefly in https://github.com/cilium/cilium/issues/3905).