moby: `systemctl docker start` get stuck in cloud-init

Description

Whith Docker 20.10 on CentOS 7, when starting it in a cloud-init script, the start is stuck.

Steps to reproduce the issue:

  1. Create a cloud-init script that install and start docker
  2. Launch a CentOS 7 VM with this cloud-init
  3. cloud-init script is stuck on startin docker

Describe the results you received:

systemctl docker start hangs forever.

Describe the results you expected:

Docker daemon starts and cloud-init script can continue.

Additional information you deem important (e.g. issue happens only occasionally):

Looks related to #41297. If I remove multi-user.target from after and launch a systemctl daemon-reload, the systemctl start docker get unstucked.

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.0
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        7287ab3
 Built:             Tue Dec  8 18:57:35 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.0
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       eeddea2
  Built:            Tue Dec  8 18:56:55 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 20.10.0
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-693.11.6.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.701GiB
 Name: ip-172-31-21-73.eu-central-1.compute.internal
 ID: 2IU4:K3UK:R5X3:2FVQ:JILR:F5US:GLDG:5JC4:KG5F:RGAI:UVBS:3ZIZ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS VM Centos7 (AMI: ami-337be65c on eu-central-1 I think)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 29 (11 by maintainers)

Most upvoted comments

I’ve come across something similar which is related to #41297 This is on Ubuntu 20.04 with docker-ce 20.10

If a systemd unit relies on docker and is set to install on multi-user.target it will never start

Sample systemd file

[Unit]
Requires=docker.service
After=docker.service

[Service]
ExecStart=/usr/bin/docker run nginx

[Install]
WantedBy=multi-user.target

The problem is that docker is set to start after multi-user.target

The docker systemd unit has:

WantedBy=multi-user.target

Which tells systemd that when multi-user.target starts docker.service should start

It also has: After=network-online.target firewalld.service containerd.service multi-user.target

Which tells systemd that docker should start after multi-user.target has fully started

These are conflicting we’re saying “multi-user.service depends on docker but docker must run after multi-user.service”

In the case of the sample unit it won’t start because it starts when multi-user.target starts but can’t because it required docker to be running which won’t start until after multi-user.target starts.

I suggest reverting 0ca7456e5284d4aa9f3e37e69c7c93eff4420d3d I can’t think of a better solution at the moment and if you search the internet for start docker container with systemd you’ll find a ton of unit files which look like the sample so this has the potential to break things for a lot of people.

The docker 20.10.1 packages are now available on download.docker.com; make sure to switch back repositories to the main (download.docker.com) package repo if you temporarily switched to the staging repository.

I’m closing this issue, as this should be resolved, but feel free to continue the conversation.

20.10 changed the following line in docker.service.

After=network-online.target firewalld.service containerd.service multi-user.target

The last entry was added in 20.10.

After I removed multi-user.target from the After=..., both docker.service and the docker container defined in another service unit started without problems. See docker/for-linux#1162

@zigarn or anyone hitting this issue would you mind running:

curl -fsSL https://get.docker.com/ | DOWNLOAD_URL=https://download-stage.docker.com REPO_FILE=docker-ce-staging.repo sh

And confirm you no longer see the issue?

@tiborvass I confirm that the issue is gone with staging package.

is this a unit you defined yourself, or is that part of some package? (curious what package the conflict is in)

One I wrote the one I supplied is the simplest form. You’ll find lots of similar ones on the internet as a guide.

The wants/wanted by/after etc. relationship is complicated.

The gist is that after determines the order where as wants determines the dependencies. This is why it’s suggested to use both.

So for example if you want a service to only start once docker is ready you use

Requires=docker.service
After=docker.service

The Require means start docker when the services starts. The After means start the service once docker has fully started.

Wants is just a less strict version of requires (it doesn’t cause failure)

So WantedBy and After of the same service doesn’t make a lot of sense.