k3s: systemd-shutdown hangs on containerd-shim when k3s-agent running

Environmental Info: K3s Version: k3s version v1.18.6+k3s1 (6f56fa1d)

Node(s) CPU architecture, OS, and Version: x86_64 Ubuntu 20.04.1 Linux nuc-linux3 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 master 2 workers

Describe the bug: When shutting down or rebooting the node, the shutdown hangs for approximately 90 seconds. The console message is

systemd-shutdown: waiting for process: containerd-shim

When researching the problem I landed on this issue: https://github.com/drud/ddev/issues/2538#issuecomment-705079079 where they said when they uninstalled k3s the problem went away. I disabled and stopped k3s-agent.service and rebooted and the problem also went away for me.

I also tried re-enabling and starting k3s-agent.service and removing the docker.io package and running apt autoremove to remove containerd, runc, etc. but it still hangs on reboot at the same place.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 21
  • Comments: 26 (5 by maintainers)

Most upvoted comments

bump same issue

I faced this issue yesterday and ended up with the following solution.

/etc/systemd/system/cgroup-kill-on-shutdown@.service :

[Unit]
Description=Kill cgroup procs on shutdown for %i
DefaultDependencies=false
Before=shutdown.target umount.target
[Service]
# Instanced units are not part of system.slice for some reason
# without this, the service isn't started at shutdown
Slice=system.slice
ExecStart=/bin/bash -c 'pids=$(cat /sys/fs/cgroup/unified/system.slice/%i/cgroup.procs); echo $pids | xargs -r kill;'                                                                                                                                                                      
ExecStart=/bin/sleep 5                                                                                                                                                                                                                                                                     
ExecStart=/bin/bash -c 'pids=$(cat /sys/fs/cgroup/unified/system.slice/%i/cgroup.procs); echo $pids | xargs -r kill -9;'
Type=oneshot
[Install]
WantedBy=shutdown.target

Enable the “service” for k3s-agent.service (will also work for k3s on the master ):

sudo systemctl enable cgroup-kill-on-shutdown@k3s-agent.service.service

# or, on the master:  sudo systemctl enable cgroup-kill-on-shutdown@k3s.service.service

I’ve written a long winding explanation here but in brief, what happens is that since killmode=process is used, all the container processes end up staying alive when k3s is brought down. Which is a good thing ™️

However, during shutdown, systemd will signal all remaining processes and wait for DefaultTimeoutStopSec for them to die. This is always 90s during the last shutdown phase with systemd v245.
It is a bug in systemd v245 shipped with ubuntu 20.04 and was fixed in september 2020

What I used to do was to set DefaultTimeoutStopSec=5s in /etc/systemd/system.conf and it worked fine, but on ubuntu 20.04 it doesn’t.

Since there’s little chance this fix will make it back into 20.04, the above “service” will perform round of SIGTERM, wait 5s, then proceed with SIGKILL to finish k3s’s process cleanup during shutdown. The sleep can be tweaked to suit your services need (something matching terminationGracePeriod perhaps)

Hope it helps.

Bump still relevant

Following https://github.com/containerd/containerd/issues/386#issuecomment-304837687 I changed the service configuration for k3s.agent and k3s-agent.service to KillMode=Mixed and that fixed the problem. This is in the standard Docker configuration.

However, I also found https://github.com/rancher/k3s/issues/1965 where it looks like this behavior is as intended. Is there a way to allow for upgrading k3s without disrupting workloads but at the same time not hang shutdowns/reboots for 90s?

That might be a good thing to add to the documentation, for folks that want it?

Here is a k3s version of https://github.com/k3s-io/k3s/issues/2400#issuecomment-1041165341:

[Unit]
Description=Kill containerd-shims on shutdown
DefaultDependencies=false
Before=shutdown.target umount.target

[Service]
ExecStart=/usr/local/bin/k3s-killall.sh
Type=oneshot

[Install]
WantedBy=shutdown.target

Put the file to /etc/systemd/system/shutdown-k3s.service and then enable the service using

systemctl enable shutdown-k3s.service

Also note that this service name shutdown-k3s shall not start with k3s-, otherwise the k3s-killall.sh script would try to stop it and cause problems.

Probably something related to the containerd version change? I’m not sure, since changing the KillMode isn’t something we test or support. I would recommend adding another unit that runs on shutdown, as described above.

would you accept a feature request to add a systemd unit like https://github.com/k3s-io/k3s/issues/2400#issuecomment-1018472343 which only triggers on shutdown? This would both allow the intended behaviour of k3s/rke2 (seamless updates/restarts) and allow for a shutdown/reboot that’s even quicker than RKE1.

here’s my non-instanced version of that (for rke2):

[Unit]
Description=Kill containerd-shims on shutdown
DefaultDependencies=false
Before=shutdown.target umount.target

[Service]
ExecStart=/bin/bash -c "/usr/local/bin/rke2-killall.sh"
Type=oneshot

[Install]
WantedBy=shutdown.target