k3s: systemd-shutdown hangs on containerd-shim when k3s-agent running
Environmental Info: K3s Version: k3s version v1.18.6+k3s1 (6f56fa1d)
Node(s) CPU architecture, OS, and Version: x86_64 Ubuntu 20.04.1 Linux nuc-linux3 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: 1 master 2 workers
Describe the bug: When shutting down or rebooting the node, the shutdown hangs for approximately 90 seconds. The console message is
systemd-shutdown: waiting for process: containerd-shim
When researching the problem I landed on this issue: https://github.com/drud/ddev/issues/2538#issuecomment-705079079 where they said when they uninstalled k3s the problem went away. I disabled and stopped k3s-agent.service and rebooted and the problem also went away for me.
I also tried re-enabling and starting k3s-agent.service and removing the docker.io package and running apt autoremove to remove containerd, runc, etc. but it still hangs on reboot at the same place.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 21
- Comments: 26 (5 by maintainers)
bump same issue
I faced this issue yesterday and ended up with the following solution.
/etc/systemd/system/cgroup-kill-on-shutdown@.service:Enable the “service” for
k3s-agent.service(will also work fork3son the master ):I’ve written a long winding explanation here but in brief, what happens is that since
killmode=processis used, all the container processes end up staying alive when k3s is brought down. Which is a good thing ™️However, during shutdown, systemd will signal all remaining processes and wait for
DefaultTimeoutStopSecfor them to die. This is always 90s during the last shutdown phase with systemd v245.It is a bug in systemd v245 shipped with ubuntu 20.04 and was fixed in september 2020
What I used to do was to set
DefaultTimeoutStopSec=5sin/etc/systemd/system.confand it worked fine, but on ubuntu 20.04 it doesn’t.Since there’s little chance this fix will make it back into 20.04, the above “service” will perform round of SIGTERM, wait 5s, then proceed with SIGKILL to finish k3s’s process cleanup during shutdown. The sleep can be tweaked to suit your services need (something matching
terminationGracePeriodperhaps)Hope it helps.
Bump still relevant
Following https://github.com/containerd/containerd/issues/386#issuecomment-304837687 I changed the service configuration for
k3s.agentandk3s-agent.servicetoKillMode=Mixedand that fixed the problem. This is in the standard Docker configuration.However, I also found https://github.com/rancher/k3s/issues/1965 where it looks like this behavior is as intended. Is there a way to allow for upgrading k3s without disrupting workloads but at the same time not hang shutdowns/reboots for 90s?
That might be a good thing to add to the documentation, for folks that want it?
Here is a k3s version of https://github.com/k3s-io/k3s/issues/2400#issuecomment-1041165341:
Put the file to
/etc/systemd/system/shutdown-k3s.serviceand then enable the service usingAlso note that this service name
shutdown-k3sshall not start withk3s-, otherwise thek3s-killall.shscript would try to stop it and cause problems.Probably something related to the containerd version change? I’m not sure, since changing the KillMode isn’t something we test or support. I would recommend adding another unit that runs on shutdown, as described above.
would you accept a feature request to add a systemd unit like https://github.com/k3s-io/k3s/issues/2400#issuecomment-1018472343 which only triggers on shutdown? This would both allow the intended behaviour of k3s/rke2 (seamless updates/restarts) and allow for a shutdown/reboot that’s even quicker than RKE1.
here’s my non-instanced version of that (for rke2):