containerd: shim v2 hangs on reboot/shutdown

Description

Containerd is shipping with KillMode=process in systemd unit, so that shims won’t be killed if containerd stops. Systemd broadcast SIGTERM during the final stage of shutdown, this is the only chance for shim to terminate gracefully. It seems that shim v2 does not handle SIGTERM/SIGINT at all, and hangs machine reboot/shutdown for 90s, got killed at last.

Shim v1 handles SIGINT/SIGTERM https://github.com/containerd/containerd/blob/c7e4747cfb5cf15eef68af71b0a5526f2343f635/cmd/containerd-shim/main_unix.go#L248-L261

Shim v2 registers and ignores SIGINT/SIGTERM https://github.com/containerd/containerd/blob/c7e4747cfb5cf15eef68af71b0a5526f2343f635/runtime/v2/shim/shim_unix.go#L81-L87

See also:

Steps to reproduce the issue:

  1. install docker-ce 20.10.x, enable live-restore
  2. docker run -d k8s.gcr.io/pause
  3. sudo reboot

Describe the results you received: The shutdown/reboot process stuck for 90s, due to containerd-shim.

[  OK  ] Reached target Shutdown.
[  OK  ] Reached target Final Step.
[  OK  ] Finished Reboot.
[  OK  ] Reached target Reboot.
[  214.337805] systemd-shutdown[1]: Waiting for process: containerd-shim

Describe the results you expected: containerd-shim should not interfere with shutdown/reboot.

What version of containerd are you using:

$ containerd --version
containerd containerd.io 1.4.4 05f951a3781f4f2c1911b05e61c160e9c30eaa8e

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 22
  • Comments: 16 (7 by maintainers)

Most upvoted comments

To fix this problem I ended up creating a systemd service: kill-all-containers.sh:

#! /usr/bin/env sh
docker ps --format '{{.ID}}' | xargs docker kill

kill-all-containers.service

[Unit]
Before=shutdown.target reboot.target halt.target final.target
DefaultDependencies=false
Description=Kill all docker containers to prevent shutdown lag
RequiresMountFor=/

[Service]
ExecStart=/path/to/kill-all-containers.sh
Type=oneshot
RemainAfterExit=true

[Install]
WantedBy=shutdown.target reboot.target halt.target final.target

Or if you are using nixos, put this in kill-all-docker-containers.nix, and import it in your configuration.nix file

{ config, pkgs, ... }:
{
systemd.services = {
    kill-all-docker-containers = {
        description = "Kill all docker containers to prevent shutdown lag";
        enable = true;
        unitConfig = {
          DefaultDependencies = false;
          RequiresMountFor = "/";
        };
        before = [ "shutdown.target" "reboot.target" "halt.target" "final.target" ];
        wantedBy = [ "shutdown.target" "reboot.target" "halt.target" "final.target" ];
        serviceConfig = {
          Type = "oneshot";
          RemainAfterExit = true;
          ExecStart = pkgs.writeScript "docker-kill-all" ''
            #! ${pkgs.runtimeShell} -e
            ${pkgs.docker}/bin/docker ps --format '{{.ID}}' | xargs ${pkgs.docker}/bin/docker kill
          '';
        };
    };
  };
}

I can work on this.

/assign

Also waiting a long time for the fix in standard debian packages.

Still a problem… Need to stop all my containers before a shutdown / restart to avoid the hang

Thinking of different things we could do for this… probably it would just be best to shutdown on SIGTERM/SIGINT. I’m not sure if ignoring the signals was intentional.

That will be released as a minor fix change or it must goes public at 1.5?

PTAL https://github.com/containerd/containerd/pull/5828

It looks like indeed the handling which ignores sigint/sigterm is from the original v2 shim code.

My machine hangs because of this; a fix would be sweet.

Sorry for misunderstanding the issue. I was thinking that the /run folder is not tmpfs so that containerd takes long time to reload the dead shim.

I am fine with the proposal about handling the sigterm in runc-shim-v2.