kubeadm: CoreDNS not started with k8s 1.11 and weave (CentOS 7)
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version 1.11
Environment:
- Kubernetes version (use
kubectl version): 1.11 - Cloud provider or hardware configuration: aws ec2 with (16vcpus 64gb RAM)
- OS (e.g. from /etc/os-release): centos 7
- Kernel (e.g.
uname -a): 3.10.0-693.17.1.el7.x86_64 - Others: weave as cni add-on
What happened?
after kubeadm init the coreos pods stay in Error
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-ljdjp 0/1 Error 6 9m
coredns-78fcdf6894-p6flm 0/1 Error 6 9m
etcd-master 1/1 Running 0 8m
heapster-5bbdfbff9f-h5h2n 1/1 Running 0 9m
kube-apiserver-master 1/1 Running 0 8m
kube-controller-manager-master 1/1 Running 0 8m
kube-proxy-5642r 1/1 Running 0 9m
kube-scheduler-master 1/1 Running 0 8m
kubernetes-dashboard-6948bdb78-bwkvx 1/1 Running 0 9m
weave-net-r5jkg 2/2 Running 0 9m
The logs of both pods show the following:
standard_init_linux.go:178: exec user process caused "operation not permitted"
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 4
- Comments: 33 (6 by maintainers)
Thanks, @chrisohaver !
This worked:
So, it appears as if there is a incompatibility between old versions of docker and SELinux with the allowPrivilegeEscalation directive which has apparently been resolved in later versions of docker.
There appear to be 3 different work-arounds:
I find this hard to believe, we pretty extensively test CentOS 7 on our side.
Do you have the system and pod logs?
There is also answer for that in stackoverflow:
https://stackoverflow.com/questions/53075796/coredns-pods-have-crashloopbackoff-or-error-state
This error
is caused when CoreDNS detects a loop in the resolve configuration, and it is the intended behavior. You are hitting this issue:
https://github.com/kubernetes/kubeadm/issues/1162
https://github.com/coredns/coredns/issues/2087
Hacky solution: Disable the CoreDNS loop detection
Edit the CoreDNS configmap:
Remove or comment out the line with
loop, save and exit.Then remove the CoreDNS pods, so new ones can be created with new config:
All should be fine after that.
Preferred Solution: Remove the loop in the DNS configuration
First, check if you are using
systemd-resolved. If you are running Ubuntu 18.04, it is probably the case.If it is, check which
resolv.conffile your cluster is using as reference:You might see a line like:
The important part is
--resolv-conf- we figure out if systemd resolv.conf is used, or not.If it is the
resolv.confofsystemd, do the following:Check the content of
/run/systemd/resolve/resolv.confto see if there is a record like:If there is
127.0.0.1, it is the one causing the loop.To get rid of it, you should not edit that file, but check other places to make it properly generated.
Check all files under
/etc/systemd/networkand if you find a record likedelete that record. Also check
/etc/systemd/resolved.confand do the same if needed. Make sure you have at least one or two DNS servers configured, such asAfter doing all that, restart the systemd services to put your changes into effect: systemctl restart systemd-networkd systemd-resolved
After that, verify that
DNS=127.0.0.1is no more in theresolv.conffile:Finally, trigger re-creation of the DNS pods
Summary: The solution involves getting rid of what looks like a DNS lookup loop from the host DNS configuration. Steps vary between different resolv.conf managers/implementations.
I verified that removing “allowPrivilegeEscalation: false” from the coredns deployment resolves the issue (with SE linux enabled in permissive mode).
Found a couple instances of the same errors reported in other scenarios in the past. Might try removing “allowPrivilegeEscalation: false” from the CoreDNS deployment to see if that helps.
Thats fine. We should perhaps mention that there are negative security implications when disabling SELinux, or changing the allowPrivilegeEscalation setting.
The most secure solution is to upgrade Docker to the version that Kubernetes recommends (17.03)
@chrisohaver do you think we should document this step in the kubeadm troubleshooting guide for SELinux nodes in the lines of:
corednspods haveCrashLoopBackOfforErrorstateIf you have nodes that are running SELinux with an older version of Docker you might experience a scenario where the
corednspods are not starting. To solve that you can try one of the following options:corednsdeployment to setallowPrivilegeEscalationtotrue:WDYT? please suggest amends to the text if you think something can be improved.
OK - have you tried removing “allowPrivilegeEscalation: false” from the CoreDNS deployment to see if that helps?