moby: `systemctl docker start` get stuck in cloud-init
Description
Whith Docker 20.10 on CentOS 7, when starting it in a cloud-init script, the start is stuck.
Steps to reproduce the issue:
- Create a cloud-init script that install and start docker
- Launch a CentOS 7 VM with this cloud-init
- cloud-init script is stuck on startin docker
Describe the results you received:
systemctl docker start hangs forever.
Describe the results you expected:
Docker daemon starts and cloud-init script can continue.
Additional information you deem important (e.g. issue happens only occasionally):
Looks related to #41297.
If I remove multi-user.target from after and launch a systemctl daemon-reload, the systemctl start docker get unstucked.
Output of docker version:
Client: Docker Engine - Community
 Version:           20.10.0
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        7287ab3
 Built:             Tue Dec  8 18:57:35 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true
Server: Docker Engine - Community
 Engine:
  Version:          20.10.0
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       eeddea2
  Built:            Tue Dec  8 18:56:55 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Output of docker info:
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)
Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 20.10.0
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-693.11.6.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.701GiB
 Name: ip-172-31-21-73.eu-central-1.compute.internal
 ID: 2IU4:K3UK:R5X3:2FVQ:JILR:F5US:GLDG:5JC4:KG5F:RGAI:UVBS:3ZIZ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
AWS VM Centos7 (AMI: ami-337be65c on eu-central-1 I think)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 29 (11 by maintainers)
I’ve come across something similar which is related to #41297 This is on Ubuntu 20.04 with docker-ce 20.10
If a systemd unit relies on docker and is set to install on multi-user.target it will never start
Sample systemd file
The problem is that docker is set to start after
multi-user.targetThe docker systemd unit has:
WantedBy=multi-user.targetWhich tells systemd that when multi-user.target starts docker.service should start
It also has:
After=network-online.target firewalld.service containerd.service multi-user.targetWhich tells systemd that docker should start after multi-user.target has fully started
These are conflicting we’re saying “multi-user.service depends on docker but docker must run after multi-user.service”
In the case of the sample unit it won’t start because it starts when multi-user.target starts but can’t because it required docker to be running which won’t start until after multi-user.target starts.
I suggest reverting 0ca7456e5284d4aa9f3e37e69c7c93eff4420d3d I can’t think of a better solution at the moment and if you search the internet for
start docker container with systemdyou’ll find a ton of unit files which look like the sample so this has the potential to break things for a lot of people.The docker 20.10.1 packages are now available on download.docker.com; make sure to switch back repositories to the main (download.docker.com) package repo if you temporarily switched to the staging repository.
I’m closing this issue, as this should be resolved, but feel free to continue the conversation.
20.10 changed the following line in docker.service.
The last entry was added in 20.10.
After I removed
multi-user.targetfrom theAfter=..., both docker.service and the docker container defined in another service unit started without problems. See docker/for-linux#1162@zigarn or anyone hitting this issue would you mind running:
And confirm you no longer see the issue?
@tiborvass I confirm that the issue is gone with staging package.
One I wrote the one I supplied is the simplest form. You’ll find lots of similar ones on the internet as a guide.
The wants/wanted by/after etc. relationship is complicated.
The gist is that after determines the order where as wants determines the dependencies. This is why it’s suggested to use both.
So for example if you want a service to only start once docker is ready you use
The Require means start docker when the services starts. The After means start the service once docker has fully started.
Wants is just a less strict version of requires (it doesn’t cause failure)
So WantedBy and After of the same service doesn’t make a lot of sense.