moby: Restarting dockerd breaks the embedded DNS server
Description
Ubuntu Unattended-upgrades upgraded docker-ce and friends. As part of the upgrade, dockerd was restarted. Subsequently, running containers could not resolve the name of peers on the same host (Docker bridge networking in use).
Log has many pairs of these two messages
Resolver Start failed for container .... setting up IP table rules failed: (iptables failed: iptables --wait -t nat -N DOCKER_OUTPUT: iptables: Chain already exists
failed to populate fields for osl sandbox
Reproduce
Create a clean VM, updata all packages, install docker, add ubuntu user to the docker group then reboot
multipass -c 2 -m 4G -d 20G jam jammy
multipass shell jam
sudo apt update
sudo apt upgrade -y
sudo apt install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo nano /etc/docker/daemon.json
sudo usermod -a -G docker ubuntu
sudo reboot
daemon.json contents, note “live-restore”: true
{
"icc": false,
"log-driver": "syslog",
"log-opts": {
"syslog-facility": "local1",
"tag": "{{.Name}}"
},
"live-restore": true,
"userland-proxy": false,
"userns-remap": "default",
"default-address-pools":
[
{"base":"192.168.255.0/24","size":29}
],
"bip": "192.168.255.253/30"
}
Following the reboot, start two containers using Docker compose.
docker pull bash:latest
nano dc.yml
docker compose -f dc.yml -p t up -d
Contents of dc.yml
services:
cona:
entrypoint: ["/bin/sleep","1d"]
hostname: hosta
image: bash:latest
restart: unless-stopped
networks:
- neta
- netb
security_opt:
- no-new-privileges:true
conb:
entrypoint: ["/bin/sleep","1d"]
hostname: hostb
image: bash:latest
restart: unless-stopped
networks:
- neta
- netb
security_opt:
- no-new-privileges:true
networks:
neta:
netb:
Prove that the embedded DNS server works
ubuntu@jam:~$ docker exec t-cona-1 ping conb
PING conb (192.168.255.26): 56 data bytes
64 bytes from 192.168.255.26: seq=0 ttl=64 time=0.224 ms
64 bytes from 192.168.255.26: seq=1 ttl=64 time=0.204 ms
^C
ubuntu@jam:~$ docker exec t-conb-1 ping cona
PING cona (192.168.255.27): 56 data bytes
64 bytes from 192.168.255.27: seq=0 ttl=64 time=0.118 ms
64 bytes from 192.168.255.27: seq=1 ttl=64 time=0.159 ms
^C
ubuntu@jam:~$
Restart dockerd and observe the logs
ubuntu@jam:~$ sudo systemctl restart docker
ubuntu@jam:~$ sudo systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-05-30 09:18:16 AEST; 5s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 1479 (dockerd)
Tasks: 9
Memory: 26.3M
CPU: 267ms
CGroup: /system.slice/docker.service
└─1479 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
May 30 09:18:16 jam dockerd[1479]: time="2023-05-30T09:18:16.369133373+10:00" level=error msg="Resolver Start failed for container b5ceaf056e1a2603abde826490cf3ac2d18d8af6095c64a6716e5e7b544bae86, \"setting up I>
May 30 09:18:16 jam dockerd[1479]: time="2023-05-30T09:18:16.369164615+10:00" level=error msg="failed to populate fields for osl sandbox 09d0460ebab447eb650884ff0e4371b86125185b8776db0ef8ba4903e915258a"
May 30 09:18:16 jam dockerd[1479]: time="2023-05-30T09:18:16.371111533+10:00" level=error msg="Resolver Start failed for container 11881e3c01dfb4f8244342ec0cc241d7c18250d1d9956d69725a9b4e1d868f0d, \"setting up I>
May 30 09:18:16 jam dockerd[1479]: time="2023-05-30T09:18:16.371129838+10:00" level=error msg="failed to populate fields for osl sandbox 0ef5358411a1bbe4d96d4d318b3a31243109436e47a1c6324ec54322be9a63da"
May 30 09:18:16 jam dockerd[1479]: time="2023-05-30T09:18:16.371269276+10:00" level=info msg="there are running containers, updated network configuration will not take affect"
May 30 09:18:16 jam dockerd[1479]: time="2023-05-30T09:18:16.371445983+10:00" level=info msg="Loading containers: done."
May 30 09:18:16 jam dockerd[1479]: time="2023-05-30T09:18:16.385157565+10:00" level=info msg="Docker daemon" commit=659604f graphdriver=overlay2 version=24.0.2
May 30 09:18:16 jam dockerd[1479]: time="2023-05-30T09:18:16.385214548+10:00" level=info msg="Daemon has completed initialization"
May 30 09:18:16 jam dockerd[1479]: time="2023-05-30T09:18:16.402196640+10:00" level=info msg="API listen on /run/docker.sock"
May 30 09:18:16 jam systemd[1]: Started Docker Application Container Engine.
ubuntu@jam:~$
Observe that the embedded DNS server is broken
ubuntu@jam:~$ docker exec t-cona-1 ping conb
ping: bad address 'conb'
ubuntu@jam:~$ docker exec t-conb-1 ping cona
ping: bad address 'cona'
ubuntu@jam:~$
Expected behavior
With live-restore=true
in play, the embedded DNS server should have operated correctly after the docker daemon restart
docker version
Client: Docker Engine - Community
Version: 24.0.2
API version: 1.43
Go version: go1.20.4
Git commit: cb74dfc
Built: Thu May 25 21:51:00 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.2
API version: 1.43 (minimum version 1.12)
Go version: go1.20.4
Git commit: 659604f
Built: Thu May 25 21:51:00 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 24.0.2
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.10.5
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.18.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 1
Server Version: 24.0.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: syslog
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc version: v1.1.7-0-g860f061
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
userns
cgroupns
Kernel Version: 5.15.0-72-generic
Operating System: Ubuntu 22.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.828GiB
Name: jam
ID: 68a7b3d7-8a6d-4204-921e-ef786a0e7ac3
Docker Root Dir: /var/lib/docker/165536.165536
Debug Mode: false
Experimental: false
Insecure Registries:
corunastylis:5000
127.0.0.0/8
Live Restore Enabled: true
Default Address Pools:
Base: 192.168.255.0/24, Size: 29
Additional Info
We have placed an APT hold
on docker-ce docker-ce-cli docker-ce-rootless-extras docker-compose-plugin docker-buildx-plugin
until this bug is resolved.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (9 by maintainers)
The fixes have been merged to the 24.0 branch so will be included in the next patch, v24.0.3
I was able to reproduce the
Resolver Start failed for container
logs on an Ubuntu Jammy multipass instance. The detailed reproduction steps were invaluable; big thank you to @bruceesmith! It was in fact directly related to a second root cause. I have confirmed that the combination of #45654 and #45657 fixes DNS on live-restored containers on Ubuntu Jammy.🤦 Right… had to be a literal
+
in between 🙈