moby: Container using network mode host does not get its resolv.conf updated when the host's resolv.conf is updated (using systemd-resolved)

Description

Because the resolv.conf is not updated on the container, it stops having access to the internet when the host / device changes networks. I saw https://github.com/docker/for-linux/issues/889 which mentions that it is supposed to be updated automatically but I actually can’t find where this is mentioned in https://docs.docker.com/v17.09/engine/userguide/networking/default_network/configure-dns/.

Is this behavior of the resolv.conf not updating with host a bug or is this something not implemented or intended behavior ?

Reproduce

Start a long running container on your laptop (which is using systemd-resolved), then move to a different network with different DNS servers. Notice that the resolv.conf inside the container is now wrong.

Expected behavior

resolv.conf on container should match the updated host’s resolv.conf

docker version

Client:
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.6
 Git commit:        ced0996600
 Built:             Wed Jul 26 21:44:58 2023
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4c9c
  Built:            Wed Jul 26 21:44:58 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.2
  GitCommit:        0cae528dd6cb557f7201036e9f43420650207b58.m
 runc:
  Version:          1.1.8
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false

Server:
 Containers: 7
  Running: 1
  Paused: 0
  Stopped: 6
 Images: 14
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 0cae528dd6cb557f7201036e9f43420650207b58.m
 runc version: 
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.4.8-zen1-1-zen
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 13.5GiB
 Name: Lenovo-Yoga-7
 ID: 3PMN:VRXJ:C3R6:RFC2:ZLXJ:OJJU:OFKE:DQLW:YBC6:YYWQ:EHPI:WDWG
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

No response

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 1
  • Comments: 19 (14 by maintainers)

Most upvoted comments

I tried following the PRs and related code between the projects to see what was different/missing, and you spotted the concern in Oct 2022 😎

Heh, I knew which PR you linked to without clicking the link. I opened that PR when I was somewhat cleaning up the resolvconf package, which had become over-engineered and complex over the years (still more to do there!).

Host --network=host handles DNS

So for the --network=host case, the situation is somewhat similar to the “default bridge” case, but for different reasons;

  • In --network=host, the container doesn’t have a networking namespace, so localhost “inside” the container === localhost “outside” the container” because from a networking perspective, there is no “inside” or “outside” the container they’re exactly the same.
  • And because there’s no networking namespace, there’s no “embedded” DNS that we can use in such a container
  • But there’s also no need to have one, because we can access “whatever” is configured on the host directly

But: here’s where the “fun” start, because while the “networking” namespace is the same, the filesystem (mount namespace) is still separate, and we still need to configure the container so that processes inside the container know what resolver to use;

  • Because the filesystem is separate, /etc/resolv.conf inside the container is a file that needs to be present inside the container
  • But its content should be the same as on the host (we want to use the same resolver if we’re in the same namespace)

A logical approach would be to bind-mount the host’s /etc/resolv.conf into the container, but that had some challenges;

  • /etc/resolv.conf on the host, depending on the system configuration, may be a symlink (bind-mounting that inside the container would try to resolve the symlink’s target inside the container)
  • /etc/resolv.conf on the host may be “modified” (the topic of this ticket); bind-mounting files uses the file’s inode, which can be problematic because most software updating files will use a copy file -> update copy -> (delete, and) replace original file, in which case the container would still be holding a mount for the deleted file (so the copy before updating).
  • and even without that issue, /etc/resolv.conf inside the container is writable, and we don’t want the container to be able to modify the file on the host (which would be the case if we’d bind-mount the file from the host’s /etc/resolv.conf).

So, for these reasons, we (again) need a COPY of the host’s /etc/resolv.conf (or whatever that’s symlinked to) for each container, and make sure that

  • if the file on the host is modified (e.g. due to WiFi connection switching)
  • AND the file has not been updated by the user (inside the container)
  • … that we update it, and get an updated version inside the container
  • (also see https://github.com/moby/moby/pull/41022, which was related to that)

Which brings us back to “square one” (described in my “bridge” comment from earlier) 😂

Reconfiguring the “embedded DNS”

So this is something I need to look into, and what came up when I discussed this with @akerouanton

While writing my earlier comment, my assumption was that the embedded DNS itself has no real configuration

  • just use a regular DNS lookup on the host, using “whatever is configured on the host” (/etc/resolv.conf on the host)
  • which can be 127.0.0.53 (if systemd-resolvd is in use)
  • and let systemd-resolvd handle the forwarding to “upstream” resolvers.

However, this MAY not be the case (this is something I need to look into / verify), and it’s possible that the embedded DNS also is using more than that, and may be reading systemd-resolvd’s UPSTREAM DNS resolvers to configure what it should use. This would mean that dynamically switching networks would also prevent the embedded DNS from using the correct DNS. And if that’s the case, that’s probably something that should be fixed.

systemd-resolved can actually run in different modes based on resolv.conf contents https://man.archlinux.org/man/systemd-resolved.8#/ETC/RESOLV.CONF the recommended way is for resolv.conf to be a symlink to /run/systemd/resolve/stub-resolv.conf but resolv.conf can be maintained by something else (like NetworManager) and then systemd-resolved can act as a consumer of that file rather than managing it, and that is how I think its setup on my system. I dont remember how or why I configured it that way but its been working perfectly for me for a while and it is not really something wrong, just not the recommended way.

Does it need to be writeable?

For docker itself, no. Customisations can be made through the --dns, --dns-opt, --add-host, --hostname etc options, and those are made when the container is created (so would not require the file to be writable).

But having these files (/etc/hosts, /etc/resolv.conf, /etc/hostname writable is a feature that was added at some point, so 🤷‍♂️ ; see

Admitted, I think most of the requests were for /etc/hosts to be writable, but there may have been some cases where either the user, or software they were running required (expected) those files to be writable.

From the description, I think this is for the “default” bridge network.

When using the default bridge, the “legacy” networking stack (pre “custom networks”) is used;

  • docker’s internal DNS resolver is not used
  • instead, containers get a copy of the host’s resolv.conf
  • as lookups happen from within the container’s networking namespace, localhost DNS can’t be used, which means that systemd-resolvd’s IP-address (127.0.0.53) cannot be used, and instead, systemd-resolvd’s “upstream” DNS servers are read from /run/systemd/resolve/resolv.conf, and included in the container’s resolv.conf
  • if no “non-localhost” DNS servers are found, use a default as “last resort” (https://github.com/moby/moby/blob/075a2d89b96ca2c31a61ce3b05214bbe2ba49af8/libnetwork/resolvconf/resolvconf.go#L76-L78)
  • a checksum is kept of the resolv.conf at time of creation, which is to allow the user to edit the file (in which case, docker will no longer update it, to prevent changes made by the user from being reverted)
  • I think when restarting the daemon, all resolv.conf copies of all containers are re-created (skipping those that were modified by the user)

This flow originated from the very early beginnings of Docker, and was designed with the assumption that the daemon would run in a server environment (no dynamic IP and/or networks), and before systemd-resolvd existed (having a localhost / 127.0.0.x resolver was an “exception”, not the “rule”);

It may be clear that a lot of complexity is involved here, and quite some parts where things can go wrong (looking up systemd-resolvd’s upstreams); dynamically updating the resolv.conf for each container could be an option, but I guess the challenge would be somewhat to decide what should trigger this; alternatively, maybe we can do this on a reload (systemctl reload docker.service to trigger re-generating resolv.conf).

The better solution would probably be to remove the legacy code-path, and always use the embedded DNS; I opened a ticket for that once;

The reason the legacy code-path still exists was (IIRC) for a few reasons, but I think most of those should no longer be a concern (and I’d love to get rid of the two distinct implementations);

  • The legacy links (including sharing of environment variables) were still used by quite some users, so we didn’t want to break those on “day 1”
  • At the time, some tools implemented filewatchers on some of docker’s internals (including the resolv.conf and /etc/hosts), and making the default bridge use the embedded DNS could break those tools. I’m not sure if that’s something we should be really concerned about, as anything inside /var/lib/docker is considered to be exclusively accessed by the daemon, so any tool making other assumptions is depending on “undocumented” behavior.
  • Kubernetes; at the time, kubernetes had parts in place to “disable” all managed networking in Docker (ISTR, they bind-mounted files over the files that Docker generated, but I forgot the details); defaulting to the embedded DNS would break some scenarios there. A lot has changed though, since that time, and most k8s setups would now be using containerd instead, and for those that still use the Docker Engine, perhaps this is not an issue anymore as well (but would need verifying).