moby: docker in qemu in docker: dockerd hangs entire system
Steps to reproduce:
- Prepare an Alpine Linux qcow2 or use this one
- Build this Dockerfile to get my qemu build (
docker build -f Dockerfile -t qemu .
) - Ensure you are in the kvm group (or have write access to /dev/kvm) on the host
- Run the following Docker command:
docker run \
-it \
--device /dev/kvm \
--mount type=tmpfs,destination=/var/tmp \
-v $(pwd):/base:rw \
-p 127.0.0.1:8022:8022 qemu \
qemu-system-x86_64 \
-m 2048 \
-net nic,model=virtio \
-net user,hostfwd=tcp::8022-:22 \
-cpu host \
-enable-kvm \
-drive file=/base/alpine.img.qcow2,media=disk,if=virtio \
-nographic
This will boot up the Alpine Linux image with qemu, you should see the kernel logs in your docker window. This image is set up so you can SSH in from the outside:
ssh -p 8022 build@localhost
There’s no password for this user and it has sudo access. Run the following commands to reproduce the dockerd issue:
sudo apk add docker
sudo service docker start
Wait a moment and you should see the whole system hang. You will have to docker kill
this container from the outside.
I’ve done this while tailing dmesg
and docker.log
and found nothing useful. It just hangs.
I have also had similar issues when running docker on an Arch Linux guest.
Output of docker version
:
Client:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.10.1
Git commit: 20527e6d83
Built: Sun May 20 19:20:18 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 5
- Comments: 16
I was curious, so I poked around a little bit.
Are you sure qemu is hanging, and it’s not just the network? If I follow your instructions, my ssh session does indeed hang, but I can still access the qemu monitor with
Ctrl-A C
, and run thequit
command rather thandocker kill
.Instead, If I run
to enable a console login, then login and run
from the console, the rest of the system (apart from ssh) appears to be fine. Since the log mentioned the
docker0
bridge, I tried deleting it on the guest withand then ssh starts to work again.
Maybe there is some conflict between the host docker networking and the guest docker networking? If I run the guest docker as
sudo dockerd --bip 172.18.0.1/16
(the host one was 172.17.0.1/16 on my system), that seems to fix it and I can successfully start containers in the guest.So I think you could resolve this for sr.ht by running the host dockerd with a non-default bridge address, and then the guest dockerd should work with the default configuration.
That was it 😄
https://builds.sr.ht/~sircmpwn/job/47477
https://builds.sr.ht/api/jobs/47477/manifest
Now I just need to get this fix packaged nicely to minimize annoyance to builds.sr.ht users. Thanks for your help!
Progress
QEMU emulator version 2.8.1 without docker appears to work with >20 minutes of uptime QEMU emulator version 3.0.0 and 3.1.0 crashes when run inside docker
QEMU + gdb remote
To build QEMU with symbols add
--disable-strip
to./configure
in the Dockerfilegdb remote to qemu in docker requires port forwarding
Building
dockerd
with symbolsIn
APKBUILD
changeoptions="!check"
tooptions="!check !strip"
gdb remote into QEMU
Even when qemu looks hung, gdb does not stop until a ctrl-c is entered. The backtrace seems to be the same as when the system is idle and is consistent across runs.
The symbols in gdb don’t appear to be in /proc/kallsyms, qemu or dockerd
Other notes
/var/log/messages
shows no errors after the crash/hangstarting
dockerd
with--iptables=false
doesn’t seem to help Once it is hung, additional ssh connections can not be made.The
stats
command seems to indicate that the VM is still runningTrying to get a copy of dmesg off the crashed container failed
todo 😴
run QEMU 3.1.0 outside of docker and/or run QEMU 2.8.1 inside docker
Progress
After container hangs running
docker container stop …
then relaunching with the same qcow filedate && sudo service docker start
followed bywatch -n 1 date
Seems to “work” for a bit. Sometimes it ran for as short as 00:00 and other times it ran up to 02:43 before crashing.It doesn’t seem to crash if docker isn’t running. Also, in one test, it stayed crashed for at least 42 minutes.
top, vmstat, etc seem to indicate that it isn’t memory. /proc/sys/kernel/random/entropy_avail as high as 2149 and as low as 90 seem to make no difference.
I guess
gdb
is next 😞QEMU Console Log
QEMU terminates upon execution of
docker container stop …