kubernetes: kubelet does not setup /etc/resolv.conf correctly for host networked pods

What happened:

TL;DR: Due to slightly lacking error handling in dockershim, pods can end up with partially setup sandboxes under certain conditions. Specifically, with incorrect dns settings.

We sporadically encounter host-networked pods with the ClusterFirstWithHostNet dnsPolicy fail trying to resolve k8s services. The /etc/resolv.conf inside the container is set to the host configuration

# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 10.0.0.2
search ec2.internal

instead of the expected value, which should point to the cluster dns at 10.100.0.53

nameserver 10.100.0.53
search namespace.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5

We have traced the problem down to this part of the dockershim code where we call StartContainer.

In all the cases where we have seen the occurrence of this issue, we have have an error like the following in our logs:

CreatePodSandbox for pod "pod-ip-10-0-2-20.ec2.internal_namespace(e808f17f20a482360e2ed9d533bcec4a)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "pod-ip-10-0-2-20.ec2.internal": operation timeout: context deadline exceeded

When this happen the code in dockershim that rewrites the /etc/resolv.conf file inside the container never gets called. When this happens, the /etc/resolv.conf inside the container is a copy of the host /etc/resolv.conf file which is what Docker does by default when no explicit dns settings have been specified.

However, in a lot of cases, the actual container is started successfully, despite the StartContainer call timing out.

Because of that, after this point, the kubelet never realises that the pod sandbox setup was actually incomplete. This code in kuberuntime_manager.go doesn’t seem to check any dns related things. Instead, for host-networked pods, the only things it effectively checks for are:

  • the sandbox container is up and running
  • the sandbox container is running with the runtimeapi.NamespaceMode_NODE mode.

Restarting the kubelet or the non-pause containers doesn’t fix this issue. The only workaround we have found is to cause the kubelet to recreate the sandbox container.

It looks like with containerd, we handle this better by cleaning up the sandbox container if we get errors.

Environment:

  • Kubernetes version (use kubectl version): v1.12.4
  • Cloud provider or hardware configuration: aws
  • OS (e.g: cat /etc/os-release): Ubuntu 18.04.2 LTS
  • Kernel (e.g. uname -a): Linux ip-10-0-2-34 4.15.0-1035-aws #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Container runtime: Docker version 18.09.5, build e8ff056

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 13
  • Comments: 20 (3 by maintainers)

Most upvoted comments

+1

Is there a somewhat documented resolution to this? Running into this issue currently and it is putting a pretty big stop on things.

Since the bug is in dockershim, the “workaround” that has worked for us is to switch to containerd.