containerd: SIGTERM doesn't kill containerd-shims
If you run containerd
as a systemd service, and you try to restart the service while containers are running the systemctl restart containerd
will block because while containerd
(the daemon) exits when it gets a SIGTERM
it doesn’t attempt to kill any of the containerd-shim
s – which then causes systemd to SIGKILL
them after 10 seconds (or whatever the timeout is).
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 5
- Comments: 16 (6 by maintainers)
Instead of directly editing the systemd service files in /lib (which might not be writable depending on the Linux distribution), use
systemctl edit docker
and add:(Using
Wants
instead ofRequires
per https://github.com/moby/moby/commit/a985655ac4eb6c5b60b5eab8d8d09a487e353e1d)This will create
/etc/systemd/system/docker.service.d/override.conf
, which you’ll also see listed insystemctl status docker
.I’ve done some tests with the KillMode in systemd https://www.freedesktop.org/software/systemd/man/systemd.kill.html.
If we set in containerd.service, under the Service section: KillMode=process
running systemctl stop containerd will stop containerd process but not containerd-shim processes. The good thing is that systemd will not get blocked.
If, instead, we set: KillMode=mixed it will stop containerd process and also containerd-shim processes. As before, systemd will not get blocked.
Thus, I think the solution is to either use process or mixed as a value for KillMode in the systemd unit file. Depending on whether you want to let the containerd-shim processes running or not.
I’m on Manjaro (Arch based). I think these steps will be similar on Ubuntu but, not 100% sure. Check out their systemd and or systemctl on their wiki.
Initially, what I did was edited docker.service like so (substitute nano with your editor of choice pico, vim etc.):
sudo nano /lib/systemd/system/docker.service
Then I just added
containerd.socket containerd.service
to the After and Requires lines, as suggested by @jordimassaguerpla (I left what was already there in place and appended the new values to the end of the line; for nano CTRL+o then enter for yes to write out and CTRL+X to exit.)Then I restarted the service using:
sudo systemctl restart docker.service
(but a restart should do it too, I think.)BUT, that may be susceptible to overwrites during updates from the upstream package. In which case you might want to follow one of the options here:
https://serverfault.com/a/840999
ETA: you may not have a containerd.socket (check under /lib/systemd/system using
ls
command). I did the first time I went through this but, just now, after uninstalling and reinstalling all my docker stuff, it wasn’t there any longer and restarting the docker.service was giving me an error containerd.socket not found, so I only added the containerd.service and that seems to still work fine at addressing slow shutdowns and reboots.Also the problem is that this is causing
containerd
to not shutdown either. So there’s one of two options:Run each container in a separate systemd service (which as someone who works on runC, I know is not going to work).
Define some signal (
SIGUSR1
for example) that can be used to tell containerd to kill every container, so that it can be set in the systemd configuration for distributions that intend to ship things that way. In particular if your system is shutting down this issue will cause system shutdown to take much longer than expected.The disadvantage of copying the whole service file to
/etc
is that you then won’t pick up any changes with your Linux distribution updates.If you use a drop-in replacement instead, you often don’t need to maintain the file (or at least it’s easier to maintain the file) when you upgrade to a newer version 😃
Another way to fix it, is to make the docker service require containerd service, as in:
[Unit] … After=network.target containerd.socket containerd.service Requires=containerd.socket containerd.service
SIGTERM
should makecontainerd
exit, always. So I’m a bit surprised you have a case where it doesn’t.I have no objection to adding a special case for
USR1
(cc @crosbymichael ?)docker should be able to handle the containers disappearing from under it.