moby: `systemctl docker start` get stuck in cloud-init
Description
Whith Docker 20.10 on CentOS 7, when starting it in a cloud-init script, the start is stuck.
Steps to reproduce the issue:
- Create a cloud-init script that install and start docker
- Launch a CentOS 7 VM with this cloud-init
- cloud-init script is stuck on startin docker
Describe the results you received:
systemctl docker start
hangs forever.
Describe the results you expected:
Docker daemon starts and cloud-init script can continue.
Additional information you deem important (e.g. issue happens only occasionally):
Looks related to #41297.
If I remove multi-user.target
from after and launch a systemctl daemon-reload
, the systemctl start docker
get unstucked.
Output of docker version
:
Client: Docker Engine - Community
Version: 20.10.0
API version: 1.41
Go version: go1.13.15
Git commit: 7287ab3
Built: Tue Dec 8 18:57:35 2020
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.0
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: eeddea2
Built: Tue Dec 8 18:56:55 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Output of docker info
:
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.0
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
Default Runtime: runc
Init Binary: docker-init
containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.701GiB
Name: ip-172-31-21-73.eu-central-1.compute.internal
ID: 2IU4:K3UK:R5X3:2FVQ:JILR:F5US:GLDG:5JC4:KG5F:RGAI:UVBS:3ZIZ
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
AWS VM Centos7 (AMI: ami-337be65c on eu-central-1 I think)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 29 (11 by maintainers)
I’ve come across something similar which is related to #41297 This is on Ubuntu 20.04 with docker-ce 20.10
If a systemd unit relies on docker and is set to install on multi-user.target it will never start
Sample systemd file
The problem is that docker is set to start after
multi-user.target
The docker systemd unit has:
WantedBy=multi-user.target
Which tells systemd that when multi-user.target starts docker.service should start
It also has:
After=network-online.target firewalld.service containerd.service multi-user.target
Which tells systemd that docker should start after multi-user.target has fully started
These are conflicting we’re saying “multi-user.service depends on docker but docker must run after multi-user.service”
In the case of the sample unit it won’t start because it starts when multi-user.target starts but can’t because it required docker to be running which won’t start until after multi-user.target starts.
I suggest reverting 0ca7456e5284d4aa9f3e37e69c7c93eff4420d3d I can’t think of a better solution at the moment and if you search the internet for
start docker container with systemd
you’ll find a ton of unit files which look like the sample so this has the potential to break things for a lot of people.The docker 20.10.1 packages are now available on download.docker.com; make sure to switch back repositories to the main (download.docker.com) package repo if you temporarily switched to the staging repository.
I’m closing this issue, as this should be resolved, but feel free to continue the conversation.
20.10 changed the following line in docker.service.
The last entry was added in 20.10.
After I removed
multi-user.target
from theAfter=...
, both docker.service and the docker container defined in another service unit started without problems. See docker/for-linux#1162@zigarn or anyone hitting this issue would you mind running:
And confirm you no longer see the issue?
@tiborvass I confirm that the issue is gone with staging package.
One I wrote the one I supplied is the simplest form. You’ll find lots of similar ones on the internet as a guide.
The wants/wanted by/after etc. relationship is complicated.
The gist is that after determines the order where as wants determines the dependencies. This is why it’s suggested to use both.
So for example if you want a service to only start once docker is ready you use
The Require means start docker when the services starts. The After means start the service once docker has fully started.
Wants is just a less strict version of requires (it doesn’t cause failure)
So WantedBy and After of the same service doesn’t make a lot of sense.