moby: Many parallel calls to `docker exec` make Docker unresponsive

Description

Since today, we’ve experienced a problem where our CI system could no longer run our tests properly.

After a bit of investigation, we found how to resolve it, but not the root cause. This happens on two independent servers, one on Ubuntu 22.04, the other on 18.04. Both are running Docker 24.0.1.

Essentially, we’re launching an image of SQL Server using a tmpfs volume for the data files. Before the tests, we’re creating a database with seed data. Then, for every test run, we run docker exec to copy the data (and log) files, and attach them as a new database. This is running in parallel without any concurrency limit, and has worked before.

Now we found that when doing this, all Docker commands become unresponsive after a short time. E.g. docker stats no longer refreshes correctly and shows only dashes -- instead of the actual information: image

Other Docker commands, e.g. docker stop take extremely long (if they finish at all). The tests run usually in 10 minutes, now it takes several hours with most tests failing due to a timeout exception when calling Docker commands from within the test suite.

Reproduce

  1. First I’m starting SQL Server
docker run -e ACCEPT_EULA=Y -e "SA_PASSWORD=yourStrong(#)Password" \
--mount type=tmpfs,destination=/var/opt/mssql/data \
--mount type=tmpfs,destination=/var/opt/mssql/log \
--mount type=tmpfs,destination=/var/opt/mssql/secrets \
dangl/mssql-tmpfs:latest
  1. Then I’m creating a quick script to just do some cp operations via docker exec in parallel:
touch concurrent_exec.sh
for i in {1..50}
do
  echo "docker exec <CONTAINER_ID> sh -c \"cp /var/opt/mssql/data/master.mdf /var/opt/mssql/data/$i.mdf\" &" >> concurrent_exec.sh
done
echo "docker exec <CONTAINER_ID> sh -c \"cp /var/opt/mssql/data/master.mdf /var/opt/mssql/data/last.mdf\"" >> concurrent_exec.sh
  1. Now I execute sh concurrent_exec.sh and watch the output of docker stats in another session. docker stats will freeze for a time (but eventually recover). It takes longer the more parallel operations I do.

The actual behavior depends on how many parallel calls I’m making. E.g. with 5, there’s no (noticeable) interruption, with 50 it already blocks for around a minute.

Expected behavior

No response

docker version

Client: Docker Engine - Community
 Version:           24.0.1
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        6802122
 Built:             Fri May 19 18:06:21 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.1
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       463850e
  Built:            Fri May 19 18:06:21 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    24.0.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.4
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 15
 Server Version: 24.0.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.0-72-generic
 Operating System: Ubuntu 22.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 125.7GiB
 Name: Ubuntu-2204-jammy-amd64-base
 ID: 74911907-4761-4663-906c-a79557c4089a
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

I’ve tried it out on a fresh server, had the same behavior there. I’d be happy to assist with any further details, but I would need some guidance on what is relevant and how to get that.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (11 by maintainers)

Most upvoted comments

Thank you guys for the quick resolution😀