kubernetes: Coredns service patch no longer works
What happened:
Previously running kubeadm, kubelet 1.12.1, had to apply CoreDNS patch to fix issue that has been diagnosed as per dns-debugging-resolution
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10
nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
Applied fix to CoreDNS, as per selector of kube-dns svc does not match coredns pod
unfortunately, this fix no longer works after upgrading to kubeadm, kubelet 1.13.1
What you expected to happen:
I expect CoreDNS network to function properly
How to reproduce it (as minimally and precisely as possible):
Noted above.
Anything else we need to know?:
[gms@thalia0 ~]$ kubectl get deployment --namespace=kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
calico-typha 0/0 0 0 77d
coredns 2/2 2 2 5s
kubernetes-dashboard 1/1 1 1 21d
[gms@thalia0 ~]$ kubectl get services --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
calico-typha ClusterIP 10.101.212.32 <none> 5473/TCP 77d
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 35s
kubernetes-dashboard ClusterIP 10.106.105.232 <none> 443/TCP 21d
[gms@thalia0 ~]$ kubectl describe svc kube-dns --namespace=kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=CoreDNS
Annotations: prometheus.io/port: 9153
prometheus.io/scrape: true
Selector: k8s-app=kube-dns
Type: ClusterIP
IP: 10.96.0.10
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 192.168.2.20:53,192.168.3.44:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 192.168.2.20:53,192.168.3.44:53
Session Affinity: None
Events: <none>
[gms@thalia0 ~]$ kubectl describe deployment coredns --namespace=kube-system
Name: coredns
Namespace: kube-system
CreationTimestamp: Fri, 11 Jan 2019 09:41:10 -0600
Labels: k8s-app=kube-dns
kubernetes.io/name=CoreDNS
Annotations: deployment.kubernetes.io/revision: 1
Selector: k8s-app=kube-dns
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: k8s-app=kube-dns
Service Account: coredns
Containers:
coredns:
Image: coredns/coredns:1.2.2
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: coredns-69cbb76ff8 (2/2 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m27s deployment-controller Scaled up replica set coredns-69cbb76ff8 to 2
Pods are these:
[gms@thalia0 ~]$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-69cbb76ff8-2px6h 1/1 Running 0 2m
coredns-69cbb76ff8-g4wbd 1/1 Running 0 2m
LOGS are as follows (and look problematic, since it is not logging anything, even after adding log to the Corefile section in the coredns configmap):
[gms@thalia0 ~]$ for p in $(kubectl get pods --namespace=kube-system -l k8s-app=coredns -o name); do kubectl logs --namespace=kube-system $p; done
End point is up and running:
[gms@thalia0 ~]$ kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 192.168.1.198:53,192.168.4.57:53,192.168.1.198:53 + 1 more... 6m32s
I tried the section “Are DNS queries being received/processed”, but the logs seem funky, since there is nothing being logged.
Also of note, is if I do a ./force-update-deploy as per ./force-update-deployment coredns -n kube-system and delete and recreate the coredns service as outlined above, the network functions fine for a few minutes, but then it will fail again.
How can I get to the logs, given the problem noted above with log output?
Environment:
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:39:04Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:31:33Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Also using Kubeadm 1.13.1 and latest Calico CNI
- Cloud provider or hardware configuration:
VM Sphere virtualized machines in local server farm.
- OS (e.g. from /etc/os-release):
RHEL Linux 7.6 VM
- Kernel (e.g.
uname -a):
Linux thalia0.domain 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 15 17:36:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
/sig network
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 41 (11 by maintainers)
Search domains are queried serially, in order that they appear in /etc/resolv.conf, one at a time. For each domain, the client waits for a response or timeout before trying the next domain. Not in parallel.
IIUC, that fix is related to NAT source port collision, and only occurs when there are multiple simultaneous requests on the service. The colliding requests would not get logged in CoreDNS because they never get that far. The chance of this happening would increase with more query volume. In this issue however the logs are pretty quiet. That suggests the only queries occurring during the failures are manual digs from a single pod. I don’t think that would be enough volume to see this issue occur at its apparent frequency, but still worth trying to see if it helps.
Could be this long standing issue … https://github.com/kubernetes/kubernetes/issues/56903 There are various workaround in that issue, some work for some people, some do not.