moby: Many parallel calls to `docker exec` make Docker unresponsive
Description
Since today, we’ve experienced a problem where our CI system could no longer run our tests properly.
After a bit of investigation, we found how to resolve it, but not the root cause. This happens on two independent servers, one on Ubuntu 22.04, the other on 18.04. Both are running Docker 24.0.1.
Essentially, we’re launching an image of SQL Server using a tmpfs volume for the data files. Before the tests, we’re creating a database with seed data. Then, for every test run, we run docker exec
to copy the data (and log) files, and attach them as a new database. This is running in parallel without any concurrency limit, and has worked before.
Now we found that when doing this, all Docker commands become unresponsive after a short time. E.g. docker stats
no longer refreshes correctly and shows only dashes --
instead of the actual information:
Other Docker commands, e.g. docker stop
take extremely long (if they finish at all). The tests run usually in 10 minutes, now it takes several hours with most tests failing due to a timeout exception when calling Docker commands from within the test suite.
Reproduce
- First I’m starting SQL Server
docker run -e ACCEPT_EULA=Y -e "SA_PASSWORD=yourStrong(#)Password" \
--mount type=tmpfs,destination=/var/opt/mssql/data \
--mount type=tmpfs,destination=/var/opt/mssql/log \
--mount type=tmpfs,destination=/var/opt/mssql/secrets \
dangl/mssql-tmpfs:latest
- Then I’m creating a quick script to just do some
cp
operations viadocker exec
in parallel:
touch concurrent_exec.sh
for i in {1..50}
do
echo "docker exec <CONTAINER_ID> sh -c \"cp /var/opt/mssql/data/master.mdf /var/opt/mssql/data/$i.mdf\" &" >> concurrent_exec.sh
done
echo "docker exec <CONTAINER_ID> sh -c \"cp /var/opt/mssql/data/master.mdf /var/opt/mssql/data/last.mdf\"" >> concurrent_exec.sh
- Now I execute
sh concurrent_exec.sh
and watch the output ofdocker stats
in another session.docker stats
will freeze for a time (but eventually recover). It takes longer the more parallel operations I do.
The actual behavior depends on how many parallel calls I’m making. E.g. with 5
, there’s no (noticeable) interruption, with 50
it already blocks for around a minute.
Expected behavior
No response
docker version
Client: Docker Engine - Community
Version: 24.0.1
API version: 1.43
Go version: go1.20.4
Git commit: 6802122
Built: Fri May 19 18:06:21 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.1
API version: 1.43 (minimum version 1.12)
Go version: go1.20.4
Git commit: 463850e
Built: Fri May 19 18:06:21 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 24.0.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.10.4
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.18.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 15
Server Version: 24.0.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc version: v1.1.7-0-g860f061
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.15.0-72-generic
Operating System: Ubuntu 22.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 125.7GiB
Name: Ubuntu-2204-jammy-amd64-base
ID: 74911907-4761-4663-906c-a79557c4089a
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional Info
I’ve tried it out on a fresh server, had the same behavior there. I’d be happy to assist with any further details, but I would need some guidance on what is relevant and how to get that.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (11 by maintainers)
Thank you guys for the quick resolution😀