containerd: SIGTERM doesn't kill containerd-shims

If you run containerd as a systemd service, and you try to restart the service while containers are running the systemctl restart containerd will block because while containerd (the daemon) exits when it gets a SIGTERM it doesn’t attempt to kill any of the containerd-shims – which then causes systemd to SIGKILL them after 10 seconds (or whatever the timeout is).

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 5
  • Comments: 16 (6 by maintainers)

Most upvoted comments

Instead of directly editing the systemd service files in /lib (which might not be writable depending on the Linux distribution), use systemctl edit docker and add:

[Unit]
After=containerd.service
Wants=containerd.service

(Using Wants instead of Requires per https://github.com/moby/moby/commit/a985655ac4eb6c5b60b5eab8d8d09a487e353e1d)

This will create /etc/systemd/system/docker.service.d/override.conf, which you’ll also see listed in systemctl status docker.

I’ve done some tests with the KillMode in systemd https://www.freedesktop.org/software/systemd/man/systemd.kill.html.

If we set in containerd.service, under the Service section: KillMode=process

running systemctl stop containerd will stop containerd process but not containerd-shim processes. The good thing is that systemd will not get blocked.

If, instead, we set: KillMode=mixed it will stop containerd process and also containerd-shim processes. As before, systemd will not get blocked.

Thus, I think the solution is to either use process or mixed as a value for KillMode in the systemd unit file. Depending on whether you want to let the containerd-shim processes running or not.

I’m on Manjaro (Arch based). I think these steps will be similar on Ubuntu but, not 100% sure. Check out their systemd and or systemctl on their wiki.

Initially, what I did was edited docker.service like so (substitute nano with your editor of choice pico, vim etc.):

sudo nano /lib/systemd/system/docker.service

Then I just added containerd.socket containerd.service to the After and Requires lines, as suggested by @jordimassaguerpla (I left what was already there in place and appended the new values to the end of the line; for nano CTRL+o then enter for yes to write out and CTRL+X to exit.)

Then I restarted the service using: sudo systemctl restart docker.service (but a restart should do it too, I think.)

BUT, that may be susceptible to overwrites during updates from the upstream package. In which case you might want to follow one of the options here:

https://serverfault.com/a/840999

ETA: you may not have a containerd.socket (check under /lib/systemd/system using ls command). I did the first time I went through this but, just now, after uninstalling and reinstalling all my docker stuff, it wasn’t there any longer and restarting the docker.service was giving me an error containerd.socket not found, so I only added the containerd.service and that seems to still work fine at addressing slow shutdowns and reboots.

Also the problem is that this is causing containerd to not shutdown either. So there’s one of two options:

  1. Run each container in a separate systemd service (which as someone who works on runC, I know is not going to work).

  2. Define some signal (SIGUSR1 for example) that can be used to tell containerd to kill every container, so that it can be set in the systemd configuration for distributions that intend to ship things that way. In particular if your system is shutting down this issue will cause system shutdown to take much longer than expected.

If I make changes, I usually prefer to create a separate service myself in /etc/systemd/system/docker.service, but that’s a neat solution nonetheless.

The disadvantage of copying the whole service file to /etc is that you then won’t pick up any changes with your Linux distribution updates.

If you use a drop-in replacement instead, you often don’t need to maintain the file (or at least it’s easier to maintain the file) when you upgrade to a newer version 😃

Another way to fix it, is to make the docker service require containerd service, as in:

[Unit] … After=network.target containerd.socket containerd.service Requires=containerd.socket containerd.service

SIGTERM should make containerd exit, always. So I’m a bit surprised you have a case where it doesn’t.

I have no objection to adding a special case for USR1 (cc @crosbymichael ?)

docker should be able to handle the containers disappearing from under it.