moby: Docker Daemon Stuck activating

Description

On an Ubuntu 16.04LTS machine running 5.2.21-050221-generic of the kernel and 18.09.9 of docker, the dockerd systemd service gets stuck in activating/ deactivating state forever.

Steps to reproduce the issue:

  1. Take an ubuntu 16.04 machine and upgrade the kernel to 5.2.21-050221-generic
  2. Install docker-ce 18.09.9
  3. Start docker daemon and perform kubernetes control plane init via kubeadm init
  4. After a while when the daemon restarts for any reason, it gets stuck in activating state forever.

Describe the results you received:

* docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           `-start.conf
   Active: deactivating (stop-sigterm) since Sat 2019-11-23 19:40:02 UTC; 1 day 16h ago
     Docs: https://docs.docker.com
 Main PID: 6901 (dockerd)
   CGroup: /system.slice/docker.service
           `-6901 /usr/bin/dockerd -g /data/docker --max-concurrent-uploads 100 --icc=false --log-level=info --iptables=true --userland-proxy=false -H fd:// --containerd=/run/containerd/containerd.sock --bip=169.254.0.1/24 --log-opt=max-size=50m --log-opt=max-file=3 --liv

Nov 23 19:40:02 tardis-master-2.2.2.2 dockerd[6901]: time="2019-11-23T19:40:02.888086999Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Nov 23 19:40:02 tardis-master-2.2.2.2 dockerd[6901]: time="2019-11-23T19:40:02.888104110Z" level=warning msg="Your kernel does not support cgroup blkio weight"
Nov 23 19:40:02 tardis-master-2.2.2.2 dockerd[6901]: time="2019-11-23T19:40:02.888120558Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Nov 23 19:40:02 tardis-master-2.2.2.2 dockerd[6901]: time="2019-11-23T19:40:02.889104651Z" level=info msg="Loading containers: start."
Nov 23 20:09:15 tardis-master-2.2.2.2 dockerd[6901]: time="2019-11-23T20:09:15.450226652Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 23 20:09:15 tardis-master-2.2.2.2 dockerd[6901]: time="2019-11-23T20:09:15.450724572Z" level=warning msg="2124a35a7413184db147ddb6501df278ef7c327f4ceff338abc1ac44211ca3fd cleanup: failed to unmount IPC: umount /data/docker/containers/2124a35a7413184db147ddb6501d

Describe the results you expected: docker daemon to be restarted/stopped/started successfully

Additional information you deem important (e.g. issue happens only occasionally):

  1. Linux Kernel Details
5.2.21-050221-generic #201910111731 SMP Fri Oct 11 17:34:34 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Output of docker version:

Client:
 Version:           18.09.9
 API version:       1.39
 Go version:        go1.11.13
 Git commit:        039a7df9ba
 Built:             Wed Sep  4 17:24:10 2019
 OS/Arch:           linux/amd64
 Experimental:      false
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Output of docker info:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Additional environment details (AWS, VirtualBox, physical, etc.):

  1. VM Details 32 Core VM running on ESXi
processor       : 31
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz
stepping        : 4
microcode       : 0x2000043
cpu MHz         : 2100.000
cache size      : 39424 KB
physical id     : 62
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 62
initial apicid  : 62
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx hypervisor lahf_lm 3dnowprefetch pti arat
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 4200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual

containerd

* containerd.service - containerd container runtime
   Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2019-11-23 19:27:20 UTC; 1 day 16h ago
     Docs: https://containerd.io
 Main PID: 1319 (containerd)
   CGroup: /system.slice/containerd.service
           |- 1319 /usr/bin/containerd
           |- 3408 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/dd7473a706891ae88a06b00ac6162ae103a3bf765471d0c0aacb2a5c81fecd5d -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
           |- 3660 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/eafb6d6963ee8d676aa7954afd4e3b41d0d22f1bcf9603a8e1031cf298eec664 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
           |- 3773 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/a3d86ebd138317a59133e2293f5412d872d5072fec6ba1476dbdb5540873d2ac -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc

Stack Dump https://gist.github.com/harshanarayana/e31486f35144ae08c7861e0bea612bc4

Systemd service definition

# /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

StartLimitBurst=3

StartLimitInterval=60s

LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

TasksMax=infinity

Delegate=yes

KillMode=process

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/docker.service.d/start.conf
[Service]
EnvironmentFile=/etc/tardis/docker.env
ExecStart=
ExecStart=/usr/bin/dockerd -g ${DOCKER_DATA_DIR} --max-concurrent-uploads 100 --icc=false --log-level=info --iptables=true --userland-proxy=false -H fd:// --containerd=/run/containerd/containerd.sock --bip=169.254.0.1/24 ${DOCKER_MAX_LOG_SIZE} ${DOCKER_MAX_LOG_FILES} ${DOCKER_OPTS}
RestartSec=10
# Contents of /etc/tardis/docker.env
DOCKER_DATA_DIR=/data/docker
DOCKER_MAX_LOG_SIZE="--log-opt=max-size=50m"
DOCKER_MAX_LOG_FILES="--log-opt=max-file=3"
DOCKER_OPTS="--live-restore"

containerd version

containerd -v
containerd containerd.io 1.2.6 894b81a4b802e4eb2a91d1ce216b8817763c29fb

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 2
  • Comments: 17 (5 by maintainers)

Most upvoted comments

Having the same issue with Ubuntu 18.04.4 and Docker 20.10.1