kubeadm: Update CoreDNS to v1.12 to fix OOM & restart

BUG REPORT

Versions

kubeadm version (use kubeadm version): kubeadm version: &version.Info{Major:“1”, Minor:“11”, GitVersion:“v1.11.1”, GitCommit:“b1b29978270dc22fecc592ac55d903350454310a”, GitTreeState:“clean”, BuildDate:“2018-07-17T18:50:16Z”, GoVersion:“go1.10.3”, Compiler:“gc”, Platform:“linux/amd64”}

Environment:

Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“11”, GitVersion:“v1.11.1”, GitCommit:“b1b29978270dc22fecc592ac55d903350454310a”, GitTreeState:“clean”, BuildDate:“2018-07-17T18:53:20Z”, GoVersion:“go1.10.3”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“11”, GitVersion:“v1.11.1”, GitCommit:“b1b29978270dc22fecc592ac55d903350454310a”, GitTreeState:“clean”, BuildDate:“2018-07-17T18:43:26Z”, GoVersion:“go1.10.3”, Compiler:“gc”, Platform:“linux/amd64”}
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release): Ubuntu 16.04 LTS X64
Kernel (e.g. uname -a): 4.4.0-91-generic #114-Ubuntu SMP
Others:

What happened?

core dns keep oom & restart, other pod works fine

get pod status

NAMESPACE NAME READY STATUS RESTARTS AGE … kube-system coredns-78fcdf6894-ls2q4 0/1 CrashLoopBackOff 12 1h kube-system coredns-78fcdf6894-xn75c 0/1 CrashLoopBackOff 12 1h …

describ the pod

Name: coredns-78fcdf6894-ls2q4 Namespace: kube-system Priority: 0 PriorityClassName: <none> Node: k8s1/172.21.0.8 Start Time: Tue, 07 Aug 2018 11:59:37 +0800 Labels: k8s-app=kube-dns pod-template-hash=3497892450 Annotations: cni.projectcalico.org/podIP=192.168.0.7/32 Status: Running IP: 192.168.0.7 Controlled By: ReplicaSet/coredns-78fcdf6894 Containers: coredns: Container ID: docker://519046f837c93439a77d75288e6d630cdbcefe875b0bdb6aa5409d566070ec03 Image: k8s.gcr.io/coredns:1.1.3 Image ID: docker-pullable://k8s.gcr.io/coredns@sha256:db2bf53126ed1c761d5a41f24a1b82a461c85f736ff6e90542e9522be4757848 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Tue, 07 Aug 2018 13:07:21 +0800 Finished: Tue, 07 Aug 2018 13:08:21 +0800 Ready: False Restart Count: 12 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-tsv2g (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-tsv2g: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-tsv2g Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: CriticalAddonsOnly node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message

Warning Unhealthy 44m kubelet, k8s1 Liveness probe failed: Get http://192.168.0.7:8080/health: dial tcp 192.168.0.7:8080: connect: connection refused Normal Pulled 41m (x5 over 1h) kubelet, k8s1 Container image “k8s.gcr.io/coredns:1.1.3” already present on machine Normal Created 41m (x5 over 1h) kubelet, k8s1 Created container Normal Started 41m (x5 over 1h) kubelet, k8s1 Started container Warning Unhealthy 40m kubelet, k8s1 Liveness probe failed: Get http://192.168.0.7:8080/health: read tcp 172.21.0.8:40972->192.168.0.7:8080: read: connection reset by peer Warning Unhealthy 34m (x2 over 38m) kubelet, k8s1 Liveness probe failed: Get http://192.168.0.7:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Warning BackOff 4m (x124 over 44m) kubelet, k8s1 Back-off restarting failed container

logs of pod

.:53 CoreDNS-1.1.3 linux/amd64, go1.10.1, b0fd575c 2018/08/07 05:13:27 [INFO] CoreDNS-1.1.3 2018/08/07 05:13:27 [INFO] linux/amd64, go1.10.1, b0fd575c 2018/08/07 05:13:27 [INFO] plugin/reload: Running configuration MD5 = 2a066f12ec80aeb2b92740dd74c17138

ram usage of master

          total        used        free      shared  buff/cache   available

Mem: 1872 711 365 8 795 960 Swap: 0 0 0

ram usage of slave

          total        used        free      shared  buff/cache   available

Mem: 1872 392 78 17 1400 1250 Swap: 0 0 0

What you expected to happen?

core dns keep working and not restart

How to reproduce it (as minimally and precisely as possible)?

kubeadm init --apiserver-advertise-address=10.4.96.3 --pod-network-cidr=192.168.0.0/16 use calico network mode

join on second slave machine

node status is ready for both kubectl get nodes NAME STATUS ROLES AGE VERSION k8s1 Ready master 1h v1.11.1 k8s2 Ready <none> 1h v1.11.1

Anything else we need to know?

I’m doing test on host with 2GB RAM, not sure if it is too small for k8s

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 42 (13 by maintainers)

Most upvoted comments

Another way to update the coredns version and raise the memory limit:

kubectl patch deployment -n=kube-system coredns -p '{"spec": {"template": {"spec":{"containers":[{"image":"k8s.gcr.io/coredns:1.2.2", "name":"coredns","resources":{"limits":{"memory":"1Gi"},"requests":{"cpu":"100m","memory":"70Mi"}}}]}}}}'

bw2 on Oct 4, 2018

Ah… I didn’t know about https://github.com/kubernetes/kubernetes#64665. Good to know!

chrisohaver on Aug 7, 2018