containerd: shim v2 hangs on reboot/shutdown
Description
Containerd is shipping with KillMode=process
in systemd unit, so that shims won’t be killed if containerd stops.
Systemd broadcast SIGTERM during the final stage of shutdown, this is the only chance for shim to terminate gracefully.
It seems that shim v2 does not handle SIGTERM/SIGINT at all, and hangs machine reboot/shutdown for 90s, got killed at last.
Shim v1 handles SIGINT/SIGTERM https://github.com/containerd/containerd/blob/c7e4747cfb5cf15eef68af71b0a5526f2343f635/cmd/containerd-shim/main_unix.go#L248-L261
Shim v2 registers and ignores SIGINT/SIGTERM https://github.com/containerd/containerd/blob/c7e4747cfb5cf15eef68af71b0a5526f2343f635/runtime/v2/shim/shim_unix.go#L81-L87
See also:
- https://github.com/moby/moby/issues/41831
- https://github.com/k3s-io/k3s/issues/2400
- https://github.com/containerd/containerd/issues/386
Steps to reproduce the issue:
- install docker-ce 20.10.x, enable live-restore
docker run -d k8s.gcr.io/pause
sudo reboot
Describe the results you received: The shutdown/reboot process stuck for 90s, due to containerd-shim.
[ OK ] Reached target Shutdown.
[ OK ] Reached target Final Step.
[ OK ] Finished Reboot.
[ OK ] Reached target Reboot.
[ 214.337805] systemd-shutdown[1]: Waiting for process: containerd-shim
Describe the results you expected: containerd-shim should not interfere with shutdown/reboot.
What version of containerd are you using:
$ containerd --version
containerd containerd.io 1.4.4 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 22
- Comments: 16 (7 by maintainers)
To fix this problem I ended up creating a systemd service: kill-all-containers.sh:
kill-all-containers.service
Or if you are using nixos, put this in kill-all-docker-containers.nix, and import it in your configuration.nix file
I can work on this.
/assign
Also waiting a long time for the fix in standard debian packages.
Still a problem… Need to stop all my containers before a shutdown / restart to avoid the hang
Thinking of different things we could do for this… probably it would just be best to shutdown on SIGTERM/SIGINT. I’m not sure if ignoring the signals was intentional.
That will be released as a minor fix change or it must goes public at 1.5?
PTAL https://github.com/containerd/containerd/pull/5828
It looks like indeed the handling which ignores sigint/sigterm is from the original v2 shim code.
My machine hangs because of this; a fix would be sweet.
Sorry for misunderstanding the issue. I was thinking that the /run folder is not tmpfs so that containerd takes long time to reload the dead shim.
I am fine with the proposal about handling the sigterm in runc-shim-v2.