containerd: Containers with curl based health checks becoming unresponsive
Description
After containerd update to 1.4.0, running a Docker container with a health check using curl (specifically bitwardenrs/server and plex images) will result in an ‘unhealthy’ status after 15 minutes or so. The container is unable to be stopped without killing the applicable containerd-shim
process.
Steps to reproduce the issue:
-
docker run -d --rm --name bitwarden -v ~/.local/tmp/data:/data/ -e ROCKET_PORT=8080 -p 8080:8080 --init --name bitwarden_run bitwardenrs/server:alpine
-
Wait
-
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 17f4b68347ad bitwardenrs/server:alpine "/start.sh" 1 hours ago Up 1 hours (unhealthy) 80/tcp, 3012/tcp, 0.0.0.0:8080->8080/tcp bitwarden_run
Describe the results you received:
-
docker stop bitwarden_run
-> very long pause (several minutes) before command completes -
docker ps
still shows bitwarden_run up but unhealthy -
ps aux|grep 17f4b
-> containerd-shim process is still runningroot 672481 4.3 0.2 712696 17812 ? Sl Aug27 56:46 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/17f4b68347adbe978d8917d575979d72fdd6b9852268506b75014cac7cafafa8 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc``
-
sudo kill -9 672481
-> container shows as stopped.
Describe the results you expected:
docker stop bitwarden_run
-> container shows as stopped and all container processes ended
Output of containerd --version
:
containerd --version containerd github.com/containerd/containerd v1.4.0.m 09814d48d50816305a8e6c1a4ae3e2bcc4ba725a.m
Any other relevant information:
Downgrading containerd to 1.3.4 fixes the issue, as does running bitwarden_rs and Plex with the --no-healthcheck
option.
OS: Archlinux (other people on the Arch forums have reported this same issue)
uname -a Linux scotty 5.8.3-arch1-1 #1 SMP PREEMPT Fri, 21 Aug 2020 16:54:16 +0000 x86_64 GNU/Linux
journalctl -u docker -r |grep warning|head
Aug 28 11:38:07 scotty dockerd[775]: time="2020-08-28T11:38:07.287399050-07:00" level=warning msg="Health check for container 17f4b68347adbe978d8917d575979d72fdd6b9852268506b75014cac7cafafa8 error: context deadline exceeded"`
journalctl -u containerd -r|head
Aug 28 11:38:19 scotty containerd[524]: time="2020-08-28T11:38:19.384986550-07:00" level=warning msg="cleaning up after shim dead" id=17f4b68347adbe978d8917d575979d72fdd6b9852268506b75014cac7cafafa8 namespace=moby
Aug 28 11:38:19 scotty containerd[524]: time="2020-08-28T11:38:19.384862508-07:00" level=info msg="shim reaped" id=17f4b68347adbe978d8917d575979d72fdd6b9852268506b75014cac7cafafa8
Thanks!
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 27
- Comments: 58 (17 by maintainers)
Here’s the fix https://github.com/containerd/containerd/pull/4546
Docker hardcodes shim v1. This will be fixed in Docker 20.x.
— EDIT fix typo
If there are still Arch users following this, please try this package:
https://pkgbuild.com/~foxboron/repos/containerd/containerd-1.4.0-2.5-x86_64.pkg.tar.zst
This package has the following patches applied
https://github.com/containerd/containerd/pull/4519 https://github.com/containerd/containerd/pull/4546
EDIT: Publishing package when the patch has been merged upstream.
I’ve bisected
v1.4.0
to this commit https://github.com/containerd/containerd/commit/e3ab8bda604dbafb6c69f4c42979e92ead445138. On my set up, the hang goes away after reverting this commit onv1.4.0
.Following git blame to a previous change https://github.com/containerd/containerd/commit/38d7d59e8ab35bd627d4e34ffe2198ac71c451d3. The commit message says
allProcesses
is meant to avoid a deadlock. That was removed in the offending commit. I’m not familiar with containerd source at all or Go for that matter. So take this for what it’s worth.Arch containerd maintaner here:
The smoking gun is Go 1.15, not containerd 1.4.0. As noted above this issue doesn’t affect Ubuntu, and I’m fairly sure they have not pushed Go 1.15 yet.
I think we can keep Arch build issues on our tracker. I don’t see the need to bother upstream with that 😃
Generally I think you can have a note that any
PIE
orRELRO
(-buildmode=pie
along with-ldflags=-linkmode=external
) is unsupported and might introduce race conditions.The downstream bug report for this issue is: https://bugs.archlinux.org/task/67773?dev=25983
BTW, I don’t think this has anything to do with curl-based healthchecks, just healthchecks in general. One of ddev’s affected images does not have a curl in the healthcheck.
Hey y’all, if you’re having the same issue, just hit the thumbs up on the initial post unless you have more logs to provide. We all have about the same symptoms
and it looks like the issue may be fixed in the next Docker release (if I’m reading that right)(see @cpuguy83 comments above).