kubeadm: Coredns pods on master are in CrashLoopBackOff after setting up pod networking

Is this a request for help?

Yes, have tried most of the troubleshooting but nothing seems to work.

What keywords did you search in kubeadm issues before filing this one?

CrashLoopBackOff, Coredns, pod networking.

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:15:32Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: Deploying a multi node setup on two VMs
OS (e.g. from /etc/os-release):

NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Kernel (e.g. uname -a):

Linux kmaster 4.15.0-55-generic #60~16.04.2-Ubuntu SMP Thu Jul 4 09:03:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Others: (kubectl get pods -n kube-system)

NAME                              READY   STATUS             RESTARTS   AGE
coredns-5c98db65d4-gz6bt          0/1     CrashLoopBackOff   114        10h
coredns-5c98db65d4-tkgxt          0/1     CrashLoopBackOff   114        10h
etcd-kmaster                      1/1     Running            0          10h
kube-apiserver-kmaster            1/1     Running            0          10h
kube-controller-manager-kmaster   1/1     Running            0          10h
kube-proxy-k4lbf                  1/1     Running            0          10h
kube-scheduler-kmaster            1/1     Running            0          10h
weave-net-sbptr                   2/2     Running            0          10h

(kubectl describe pod coredns-5c98db65d4-tkgxt -n kube-system)

Name:                 coredns-5c98db65d4-tkgxt
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 kmaster/10.112.187.96
Start Time:           Fri, 26 Jul 2019 06:46:34 +0530
Labels:               k8s-app=kube-dns
                      pod-template-hash=5c98db65d4
Annotations:          <none>
Status:               Running
IP:                   10.32.0.4
Controlled By:        ReplicaSet/coredns-5c98db65d4
Containers:
  coredns:
    Container ID:  docker://88d8c3243407c82a5eb7c9604f58858d9534dd2a2bc44d8717a404a09fe2566e
    Image:         k8s.gcr.io/coredns:1.3.1
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:02382353821b12c21b062c59184e227e001079bb13ebd01f9d3270ba0fcbf1e4
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 26 Jul 2019 17:01:13 +0530
      Finished:     Fri, 26 Jul 2019 17:01:43 +0530
    Ready:          False
    Restart Count:  114
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-spm7d (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-spm7d:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-spm7d
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type  |   Reason   |  Age |  From  | Message 
 --- | --- | --- | --- | ---
  Warning | BackOff   | 7m57s (x2606 over 10h) | kubelet, kmaster | Back-off restarting failed container 
 Warning | Unhealthy | 3m (x342 over 10h) |     kubelet, kmaster | Readiness probe failed: HTTP probe failed with statuscode: 503

What happened?

The coredns pods are in CrashLoopBackOff status after the pod networking is setup.

What you expected to happen?

The coredns pods should have been up and running.

How to reproduce it (as minimally and precisely as possible)?

kubeadm init --apiserver-advertise-address=10.112.187.96 --pod-network-cidr=10.244.0.0/16

Anything else we need to know?

I have tried with various pod networking solutions like kube-flannel, wave-kube, calico.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 2
Comments: 31 (11 by maintainers)

Most upvoted comments

Thanks @anushakamath97 I guess you’ve encountered a bug with CoreDNS on ubuntu.

$ kubectl logs -f coredns-fb8b8dccf-jgsqf -n kube-system
...
2019-07-26T05:12:51.129Z [FATAL] plugin/loop: Loop (127.0.0.1:37926 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 6687652606367193539.3502358703036269394."

You need to manually edit CoreDNS ConfigMap:

$ kubectl edit cm coredns -n kube-system

Then CoreDNS Pod should can work when it restarts.

+20

SataQiu on Jul 26, 2019

try a different CNI plugin instead of flannel. i recommend Calico or WeaveNet.

neolit123 on Sep 26, 2019

Thanks @SataQiu. The coredns pods show running but there is Readiness probe failure error in the pod. I am running a predator application as well.

kubectl get pods --all-namespaces
NAMESPACE     NAME                                                              READY   STATUS      RESTARTS   AGE
default       predator.6ac899c2-8548-4e17-b174-b9fed22d0a4a-156427239352h6c9c   0/1     Completed   0          36m
default       predator.de2af166-7c80-4dc4-b6e3-5cd4b023ae93-156427264155dwc5l   0/1     Completed   0          32m
default       wintering-gorilla-predator-cf6c86458-6blqg                        1/1     Running     0          68m
kube-system   coredns-5c98db65d4-2z4nb                                          1/1     Running     6          4h36m
kube-system   coredns-5c98db65d4-zsb8v                                          1/1     Running     7          4h36m
kube-system   etcd-kmaster                                                      1/1     Running     2          4h43m
kube-system   kube-apiserver-kmaster                                            1/1     Running     2          4h43m
kube-system   kube-controller-manager-kmaster                                   1/1     Running     2          4h43m
kube-system   kube-proxy-89v7n                                                  1/1     Running     0          4h31m
kube-system   kube-proxy-rgt45                                                  1/1     Running     2          4h44m
kube-system   kube-scheduler-kmaster                                            1/1     Running     2          4h43m
kube-system   tiller-deploy-5f669f7664-hk79h                                    1/1     Running     0          80m
kube-system   weave-net-rsq9l                                                   2/2     Running     6          4h41m
kube-system   weave-net-v45qc                                                   2/2     Running     0          69m

Coredns pod describe

root@kmaster:~# kubectl describe pod coredns-5c98db65d4-2z4nb -n kube-system
Name:                 coredns-5c98db65d4-2z4nb
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 kmaster/10.112.187.83
Start Time:           Sun, 28 Jul 2019 01:36:23 +0530
Labels:               k8s-app=kube-dns
                      pod-template-hash=5c98db65d4
Annotations:          <none>
Status:               Running
IP:                   10.32.0.4
Controlled By:        ReplicaSet/coredns-5c98db65d4
Containers:
  coredns:
    Container ID:  docker://a13f51e2fc0e9f2ef707affd7221e2404139c5794badda6ae3846800ba54b40a
    Image:         k8s.gcr.io/coredns:1.3.1
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:02382353821b12c21b062c59184e227e001079bb13ebd01f9d3270ba0fcbf1e4
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Sun, 28 Jul 2019 05:41:52 +0530
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Sun, 28 Jul 2019 05:40:52 +0530
      Finished:     Sun, 28 Jul 2019 05:41:29 +0530
    Ready:          True
    Restart Count:  6
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-92254 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-92254:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-92254
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                 From              Message
  ----     ------     ----                ----              -------
  Warning  Unhealthy  28m                 kubelet, kmaster  Readiness probe failed: HTTP probe failed with statuscode: 503
  Warning  Unhealthy  27m (x2 over 27m)   kubelet, kmaster  Readiness probe failed: Get http://10.32.0.4:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    27m (x4 over 129m)  kubelet, kmaster  Back-off restarting failed container
  Normal   Created    27m (x6 over 151m)  kubelet, kmaster  Created container coredns
  Normal   Pulled     27m (x6 over 151m)  kubelet, kmaster  Container image "k8s.gcr.io/coredns:1.3.1" already present on machine
  Normal   Started    27m (x6 over 151m)  kubelet, kmaster  Started container coredns
  Warning  Unhealthy  27m (x3 over 129m)  kubelet, kmaster  Readiness probe failed: Get http://10.32.0.4:8080/health: dial tcp 10.32.0.4:8080: connect: connection refused

anushakamath97 on Jul 27, 2019

Thank you @SataQiu I have tried this and removed loop from the configmap. The problem still persists.

Here is the configmap of coredns

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        #loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2019-07-26T01:14:28Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "52884"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: ea4b5256-d6bd-4968-b922-07722d6eacb1

anushakamath97 on Jul 26, 2019