containerd: SIGTERM doesn't kill containerd-shims
If you run containerd as a systemd service, and you try to restart the service while containers are running the systemctl restart containerd will block because while containerd (the daemon) exits when it gets a SIGTERM it doesn’t attempt to kill any of the containerd-shims – which then causes systemd to SIGKILL them after 10 seconds (or whatever the timeout is).
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 5
- Comments: 16 (6 by maintainers)
Instead of directly editing the systemd service files in /lib (which might not be writable depending on the Linux distribution), use
systemctl edit dockerand add:(Using
Wantsinstead ofRequiresper https://github.com/moby/moby/commit/a985655ac4eb6c5b60b5eab8d8d09a487e353e1d)This will create
/etc/systemd/system/docker.service.d/override.conf, which you’ll also see listed insystemctl status docker.I’ve done some tests with the KillMode in systemd https://www.freedesktop.org/software/systemd/man/systemd.kill.html.
If we set in containerd.service, under the Service section: KillMode=process
running systemctl stop containerd will stop containerd process but not containerd-shim processes. The good thing is that systemd will not get blocked.
If, instead, we set: KillMode=mixed it will stop containerd process and also containerd-shim processes. As before, systemd will not get blocked.
Thus, I think the solution is to either use process or mixed as a value for KillMode in the systemd unit file. Depending on whether you want to let the containerd-shim processes running or not.
I’m on Manjaro (Arch based). I think these steps will be similar on Ubuntu but, not 100% sure. Check out their systemd and or systemctl on their wiki.
Initially, what I did was edited docker.service like so (substitute nano with your editor of choice pico, vim etc.):
sudo nano /lib/systemd/system/docker.serviceThen I just added
containerd.socket containerd.serviceto the After and Requires lines, as suggested by @jordimassaguerpla (I left what was already there in place and appended the new values to the end of the line; for nano CTRL+o then enter for yes to write out and CTRL+X to exit.)Then I restarted the service using:
sudo systemctl restart docker.service(but a restart should do it too, I think.)BUT, that may be susceptible to overwrites during updates from the upstream package. In which case you might want to follow one of the options here:
https://serverfault.com/a/840999
ETA: you may not have a containerd.socket (check under /lib/systemd/system using
lscommand). I did the first time I went through this but, just now, after uninstalling and reinstalling all my docker stuff, it wasn’t there any longer and restarting the docker.service was giving me an error containerd.socket not found, so I only added the containerd.service and that seems to still work fine at addressing slow shutdowns and reboots.Also the problem is that this is causing
containerdto not shutdown either. So there’s one of two options:Run each container in a separate systemd service (which as someone who works on runC, I know is not going to work).
Define some signal (
SIGUSR1for example) that can be used to tell containerd to kill every container, so that it can be set in the systemd configuration for distributions that intend to ship things that way. In particular if your system is shutting down this issue will cause system shutdown to take much longer than expected.The disadvantage of copying the whole service file to
/etcis that you then won’t pick up any changes with your Linux distribution updates.If you use a drop-in replacement instead, you often don’t need to maintain the file (or at least it’s easier to maintain the file) when you upgrade to a newer version 😃
Another way to fix it, is to make the docker service require containerd service, as in:
[Unit] … After=network.target containerd.socket containerd.service Requires=containerd.socket containerd.service
SIGTERMshould makecontainerdexit, always. So I’m a bit surprised you have a case where it doesn’t.I have no objection to adding a special case for
USR1(cc @crosbymichael ?)docker should be able to handle the containers disappearing from under it.