kubeadm: CoreDNS not started with k8s 1.11 and weave (CentOS 7)

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version 1.11

Environment:

Kubernetes version (use kubectl version): 1.11
Cloud provider or hardware configuration: aws ec2 with (16vcpus 64gb RAM)
OS (e.g. from /etc/os-release): centos 7
Kernel (e.g. uname -a): 3.10.0-693.17.1.el7.x86_64
Others: weave as cni add-on

What happened?

after kubeadm init the coreos pods stay in Error

NAME                                   READY     STATUS    RESTARTS   AGE
coredns-78fcdf6894-ljdjp               0/1       Error     6          9m
coredns-78fcdf6894-p6flm               0/1       Error     6          9m
etcd-master                            1/1       Running   0          8m
heapster-5bbdfbff9f-h5h2n              1/1       Running   0          9m
kube-apiserver-master                  1/1       Running   0          8m
kube-controller-manager-master         1/1       Running   0          8m
kube-proxy-5642r                       1/1       Running   0          9m
kube-scheduler-master                  1/1       Running   0          8m
kubernetes-dashboard-6948bdb78-bwkvx   1/1       Running   0          9m
weave-net-r5jkg                        2/2       Running   0          9m

The logs of both pods show the following: standard_init_linux.go:178: exec user process caused "operation not permitted"

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 4
Comments: 33 (6 by maintainers)

Links to this issue

CoreDNS CrashLoopBackOff on fresh install

Most upvoted comments

Thanks, @chrisohaver !

This worked:

kubectl -n kube-system get deployment coredns -o yaml | \
  sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
  kubectl apply -f -

+28

kuznero on Aug 10, 2018

So, it appears as if there is a incompatibility between old versions of docker and SELinux with the allowPrivilegeEscalation directive which has apparently been resolved in later versions of docker.

There appear to be 3 different work-arounds:

Upgrade to newer version of docker, e.g. 17.03, the version currently recommended by k8s
Or remove allowPrivilegeEscalation=false from the deployment’s pod spec
Or disable SELinux

+12

chrisohaver on Aug 1, 2018

I find this hard to believe, we pretty extensively test CentOS 7 on our side.

Do you have the system and pod logs?

timothysc on Jul 17, 2018

There is also answer for that in stackoverflow:
https://stackoverflow.com/questions/53075796/coredns-pods-have-crashloopbackoff-or-error-state

This error

[FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected

is caused when CoreDNS detects a loop in the resolve configuration, and it is the intended behavior. You are hitting this issue:

https://github.com/kubernetes/kubeadm/issues/1162

https://github.com/coredns/coredns/issues/2087

Hacky solution: Disable the CoreDNS loop detection

Edit the CoreDNS configmap:

kubectl -n kube-system edit configmap coredns

Remove or comment out the line with loop, save and exit.

Then remove the CoreDNS pods, so new ones can be created with new config:

kubectl -n kube-system delete pod -l k8s-app=kube-dns

All should be fine after that.

Preferred Solution: Remove the loop in the DNS configuration

First, check if you are using systemd-resolved. If you are running Ubuntu 18.04, it is probably the case.

systemctl list-unit-files | grep enabled | grep systemd-resolved

If it is, check which resolv.conf file your cluster is using as reference:

ps auxww | grep kubelet

You might see a line like:

/usr/bin/kubelet ... --resolv-conf=/run/systemd/resolve/resolv.conf

The important part is --resolv-conf - we figure out if systemd resolv.conf is used, or not.

If it is the resolv.conf of systemd, do the following:

Check the content of /run/systemd/resolve/resolv.conf to see if there is a record like:

nameserver 127.0.0.1

If there is 127.0.0.1, it is the one causing the loop.

To get rid of it, you should not edit that file, but check other places to make it properly generated.

Check all files under /etc/systemd/network and if you find a record like

DNS=127.0.0.1

delete that record. Also check /etc/systemd/resolved.conf and do the same if needed. Make sure you have at least one or two DNS servers configured, such as

DNS=1.1.1.1 1.0.0.1

After doing all that, restart the systemd services to put your changes into effect: systemctl restart systemd-networkd systemd-resolved

After that, verify that DNS=127.0.0.1 is no more in the resolv.conf file:

cat /run/systemd/resolve/resolv.conf

Finally, trigger re-creation of the DNS pods

kubectl -n kube-system delete pod -l k8s-app=kube-dns

Summary: The solution involves getting rid of what looks like a DNS lookup loop from the host DNS configuration. Steps vary between different resolv.conf managers/implementations.

mydockergit on Jan 23, 2019

I verified that removing “allowPrivilegeEscalation: false” from the coredns deployment resolves the issue (with SE linux enabled in permissive mode).

chrisohaver on Aug 1, 2018

Found a couple instances of the same errors reported in other scenarios in the past. Might try removing “allowPrivilegeEscalation: false” from the CoreDNS deployment to see if that helps.

chrisohaver on Jul 19, 2018

Thats fine. We should perhaps mention that there are negative security implications when disabling SELinux, or changing the allowPrivilegeEscalation setting.

The most secure solution is to upgrade Docker to the version that Kubernetes recommends (17.03)

chrisohaver on Aug 10, 2018

@chrisohaver do you think we should document this step in the kubeadm troubleshooting guide for SELinux nodes in the lines of:

`coredns` pods have `CrashLoopBackOff` or `Error` state

If you have nodes that are running SELinux with an older version of Docker you might experience a scenario where the coredns pods are not starting. To solve that you can try one of the following options:

Upgrade to a newer version of Docker - 17.03 is confirmed to work.
Disable SELinux.
Modify the coredns deployment to set allowPrivilegeEscalation to true:

kubectl -n kube-system get deployment coredns -o yaml | \
  sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
  kubectl apply -f -

WDYT? please suggest amends to the text if you think something can be improved.

neolit123 on Aug 10, 2018

OK - have you tried removing “allowPrivilegeEscalation: false” from the CoreDNS deployment to see if that helps?

chrisohaver on Jul 23, 2018