containerd: Containers with curl based health checks becoming unresponsive

Description

After containerd update to 1.4.0, running a Docker container with a health check using curl (specifically bitwardenrs/server and plex images) will result in an ‘unhealthy’ status after 15 minutes or so. The container is unable to be stopped without killing the applicable containerd-shim process.

Steps to reproduce the issue:

  1. docker run -d --rm --name bitwarden -v ~/.local/tmp/data:/data/ -e ROCKET_PORT=8080 -p 8080:8080 --init --name bitwarden_run bitwardenrs/server:alpine

  2. Wait

  3. docker ps

     CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS                    PORTS            
                               NAMES
     17f4b68347ad        bitwardenrs/server:alpine   "/start.sh"              1 hours ago        Up 1 hours (unhealthy)   80/tcp, 3012/tcp,
      0.0.0.0:8080->8080/tcp   bitwarden_run 
    

Describe the results you received:

  1. docker stop bitwarden_run -> very long pause (several minutes) before command completes

  2. docker ps still shows bitwarden_run up but unhealthy

  3. ps aux|grep 17f4b -> containerd-shim process is still running

     root      672481  4.3  0.2 712696 17812 ?        Sl   Aug27  56:46 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/17f4b68347adbe978d8917d575979d72fdd6b9852268506b75014cac7cafafa8 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc``
    
  4. sudo kill -9 672481 -> container shows as stopped.

Describe the results you expected:

  1. docker stop bitwarden_run -> container shows as stopped and all container processes ended

Output of containerd --version:

containerd --version containerd github.com/containerd/containerd v1.4.0.m 09814d48d50816305a8e6c1a4ae3e2bcc4ba725a.m

Any other relevant information:

Downgrading containerd to 1.3.4 fixes the issue, as does running bitwarden_rs and Plex with the --no-healthcheck option.

OS: Archlinux (other people on the Arch forums have reported this same issue)

uname -a Linux scotty 5.8.3-arch1-1 #1 SMP PREEMPT Fri, 21 Aug 2020 16:54:16 +0000 x86_64 GNU/Linux

journalctl -u docker -r |grep warning|head
Aug 28 11:38:07 scotty dockerd[775]: time="2020-08-28T11:38:07.287399050-07:00" level=warning msg="Health check for container 17f4b68347adbe978d8917d575979d72fdd6b9852268506b75014cac7cafafa8 error: context deadline exceeded"`
journalctl -u containerd -r|head

Aug 28 11:38:19 scotty containerd[524]: time="2020-08-28T11:38:19.384986550-07:00" level=warning msg="cleaning up after shim dead" id=17f4b68347adbe978d8917d575979d72fdd6b9852268506b75014cac7cafafa8 namespace=moby
Aug 28 11:38:19 scotty containerd[524]: time="2020-08-28T11:38:19.384862508-07:00" level=info msg="shim reaped" id=17f4b68347adbe978d8917d575979d72fdd6b9852268506b75014cac7cafafa8

Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 27
  • Comments: 58 (17 by maintainers)

Most upvoted comments

Docker hardcodes shim v1. This will be fixed in Docker 20.x.

— EDIT fix typo

If there are still Arch users following this, please try this package:

https://pkgbuild.com/~foxboron/repos/containerd/containerd-1.4.0-2.5-x86_64.pkg.tar.zst

This package has the following patches applied

https://github.com/containerd/containerd/pull/4519 https://github.com/containerd/containerd/pull/4546

EDIT: Publishing package when the patch has been merged upstream.

I’ve bisected v1.4.0 to this commit https://github.com/containerd/containerd/commit/e3ab8bda604dbafb6c69f4c42979e92ead445138. On my set up, the hang goes away after reverting this commit on v1.4.0.

Following git blame to a previous change https://github.com/containerd/containerd/commit/38d7d59e8ab35bd627d4e34ffe2198ac71c451d3. The commit message says allProcesses is meant to avoid a deadlock. That was removed in the offending commit. I’m not familiar with containerd source at all or Go for that matter. So take this for what it’s worth.

Arch containerd maintaner here:

The smoking gun is Go 1.15, not containerd 1.4.0. As noted above this issue doesn’t affect Ubuntu, and I’m fairly sure they have not pushed Go 1.15 yet.

I think we can keep Arch build issues on our tracker. I don’t see the need to bother upstream with that 😃

Generally I think you can have a note that any PIE or RELRO (-buildmode=pie along with -ldflags=-linkmode=external) is unsupported and might introduce race conditions.

The downstream bug report for this issue is: https://bugs.archlinux.org/task/67773?dev=25983

BTW, I don’t think this has anything to do with curl-based healthchecks, just healthchecks in general. One of ddev’s affected images does not have a curl in the healthcheck.

Hey y’all, if you’re having the same issue, just hit the thumbs up on the initial post unless you have more logs to provide. We all have about the same symptoms and it looks like the issue may be fixed in the next Docker release (if I’m reading that right) (see @cpuguy83 comments above).