kubernetes: Coredns service patch no longer works

What happened:

Previously running kubeadm, kubelet 1.12.1, had to apply CoreDNS patch to fix issue that has been diagnosed as per dns-debugging-resolution

[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

Applied fix to CoreDNS, as per selector of kube-dns svc does not match coredns pod

unfortunately, this fix no longer works after upgrading to kubeadm, kubelet 1.13.1

What you expected to happen:

I expect CoreDNS network to function properly

How to reproduce it (as minimally and precisely as possible):

Noted above.

Anything else we need to know?:

[gms@thalia0 ~]$ kubectl get deployment --namespace=kube-system
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
calico-typha           0/0     0            0           77d
coredns                2/2     2            2           5s
kubernetes-dashboard   1/1     1            1           21d

[gms@thalia0 ~]$ kubectl get services --namespace=kube-system
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
calico-typha           ClusterIP   10.101.212.32    <none>        5473/TCP        77d
kube-dns               ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP   35s
kubernetes-dashboard   ClusterIP   10.106.105.232   <none>        443/TCP         21d

[gms@thalia0 ~]$ kubectl describe svc kube-dns --namespace=kube-system
Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=CoreDNS
Annotations:       prometheus.io/port: 9153
                   prometheus.io/scrape: true
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.96.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         192.168.2.20:53,192.168.3.44:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         192.168.2.20:53,192.168.3.44:53
Session Affinity:  None
Events:            <none>

[gms@thalia0 ~]$ kubectl describe deployment coredns  --namespace=kube-system
Name:                   coredns
Namespace:              kube-system
CreationTimestamp:      Fri, 11 Jan 2019 09:41:10 -0600
Labels:                 k8s-app=kube-dns
                        kubernetes.io/name=CoreDNS
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               k8s-app=kube-dns
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 1 max surge
Pod Template:
  Labels:           k8s-app=kube-dns
  Service Account:  coredns
  Containers:
   coredns:
    Image:       coredns/coredns:1.2.2
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   coredns-69cbb76ff8 (2/2 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  2m27s  deployment-controller  Scaled up replica set coredns-69cbb76ff8 to 2

Pods are these:

[gms@thalia0 ~]$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE
coredns-69cbb76ff8-2px6h   1/1     Running   0          2m
coredns-69cbb76ff8-g4wbd   1/1     Running   0          2m

LOGS are as follows (and look problematic, since it is not logging anything, even after adding log to the Corefile section in the coredns configmap):

[gms@thalia0 ~]$ for p in $(kubectl get pods --namespace=kube-system -l k8s-app=coredns -o name); do kubectl logs --namespace=kube-system $p; done

End point is up and running:

[gms@thalia0 ~]$ kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                                                       AGE
kube-dns   192.168.1.198:53,192.168.4.57:53,192.168.1.198:53 + 1 more...   6m32s

I tried the section “Are DNS queries being received/processed”, but the logs seem funky, since there is nothing being logged.

Also of note, is if I do a ./force-update-deploy as per ./force-update-deployment coredns -n kube-system and delete and recreate the coredns service as outlined above, the network functions fine for a few minutes, but then it will fail again.

How can I get to the logs, given the problem noted above with log output?

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:39:04Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:31:33Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

Also using Kubeadm 1.13.1 and latest Calico CNI

  • Cloud provider or hardware configuration:

VM Sphere virtualized machines in local server farm.

  • OS (e.g. from /etc/os-release):

RHEL Linux 7.6 VM

  • Kernel (e.g. uname -a):

Linux thalia0.domain 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 15 17:36:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:
  • Others:

/sig network

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 41 (11 by maintainers)

Most upvoted comments

This is visible when DNS requests are made and searchdomains are configured, then multiple simultaneous DNS requests are done…

Search domains are queried serially, in order that they appear in /etc/resolv.conf, one at a time. For each domain, the client waits for a response or timeout before trying the next domain. Not in parallel.

IIUC, that fix is related to NAT source port collision, and only occurs when there are multiple simultaneous requests on the service. The colliding requests would not get logged in CoreDNS because they never get that far. The chance of this happening would increase with more query volume. In this issue however the logs are pretty quiet. That suggests the only queries occurring during the failures are manual digs from a single pod. I don’t think that would be enough volume to see this issue occur at its apparent frequency, but still worth trying to see if it helps.

Could be this long standing issue … https://github.com/kubernetes/kubernetes/issues/56903 There are various workaround in that issue, some work for some people, some do not.