kubernetes: Containers intermittently start with host /etc/resolv.conf instead of the kubernetes created file

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened: The issue was noticed when some containers stopped being able to resolve cluster addresses. After a docker daemon restart, ~10% of containers that came up had invalid /etc/resolv.conf files (did not contain kubedns settings, just the host /etc/resolv/conf). This was distributed evenly across the 4 types of images/pod configurations in use.

What you expected to happen: All containers should start with a resolv.conf that matches the environment.

How to reproduce it (as minimally and precisely as possible): sudo systemctl restart docker These nodes had ~80 pods each

Anything else we need to know?: This may or may not be at all relevant (it could just be a coincidence): This issue was noticed on 2 servers whose docker daemon was restarted, and the result was 9 containers with this issue, on each node (of an original ~80). Stopping the 9 containers with issues (docker stop …) resulted in 1 container with this issue, on each node. Stopping that 1 container resulted in no containers with this issue.

Printed container id + resolv.conf with: sudo find /var/lib/docker/containers/ -name resolv.conf -exec sh -c ‘echo “$0” && cat “$0”’ {} ; Counted containers with issue with (ignoring 1 for the kube-proxy container): sudo find /var/lib/docker/containers/ -name resolv.conf -exec cat {} ; | grep 10.0.0.2

//kube-proxy resolv.conf
nameserver 10.0.0.2
search ec2.internal

//expected resolv.conf for other pods
nameserver 10.128.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5

// invalid resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known DNS servers.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 10.0.0.2
search ec2.internal

Environment:

Kubernetes version (use kubectl version): 1.7.2
Cloud provider or hardware configuration**: AWS EC2
OS (e.g. from /etc/os-release): Container Linux by CoreOS stable (1409.7.0)
Kernel (e.g. uname -a):Linux 4.11.11-coreos
Install tools: manual download of kubelet 1.7.2
Others: Docker version 1.12.6, build a82d35e

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 2
Comments: 16 (2 by maintainers)

Most upvoted comments

I think I have found the cause for this (in our situation at least). We have /var/lib/docker symlinked to another file system and Kubelet running in docker container. When Kubelet starts the pod container, it checks the resolv.conf in the container directory. This directory is expanded, so /var/lib/docker become /mnt/volume/docker. That path was not mounted into the Kubelet container so then Kubelet can’t fix the resolv.conf and the docker default remains. We saw the error messages in Kubernetes v1.5.6, but it was just a harmless warning, somehow it retried. Starting with 1.6 Kubelet’s behaviour changed and it fails to work constantly. Having the exact path available in the Kubelet container fixed it for us.

Nice hint (that is written off as a harmless warning all over the place) is:

docker_manager.go:2282] Failed to create pod infra container: RunContainerError; Skipping pod "PODNAME(PODHASH)": ResolvConfPath "/mnt/vol/docker/containers/CONTAINERHASH/resolv.conf" does not exist
E0523 08:55:46.470666   14438 pod_workers.go:182] Error syncing pod PODHASH ("PODNAME(PODHASH)"), skipping: failed to "StartContainer" for "POD" with RunContainerError: "ResolvConfPath \"/mnt/vol/docker/containers/CONTAINERHASH/resolv.conf\" does not exist"

robdewit on May 23, 2018