k3s: Too many open files
Version: k3s version v0.10.2 (8833bfd9)
Describe the bug
After running for +/- 3 days the host becomes unavailable. To many open files. Now running for 1 day I have 1250597 open inodes.
$ lsof
...
container 5747 25760 container root *040u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *041u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *042u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *043u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *044u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *045u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *046u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *047u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *048u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *049u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *050u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *051u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *052u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *053u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *054u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *055u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *056u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *057u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *058u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *059u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *060u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *061u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *062u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *063u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *064u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *065u a_inode 0,13 0 9259 [eventfd]
container 5747 25760 container root *066u a_inode 0,13 0 9259 [eventfd]
...
To Reproduce Run pods not sure how to reproduce. It started to appear from version upgrade 0.9 to 0.10
Expected behavior
Actual behavior
Host runs out nodes.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 24 (9 by maintainers)
Looks like this is being tracked with https://github.com/containerd/containerd/issues/3949 and there is a PR to fix at https://github.com/containerd/containerd/pull/3956
It’s a blocker problem - issue needs labeling and prioritizing please
This is not related to Longhorn. I’m not running it, and I have the exact same problem.
And culprit is obviously liveness/readiness probes with exec. At first, I had to reboot my servers once around three days. Then I took the liveness probes out of my own software. Now I get about a week before reboot. Still Calico, ingress-nginx, flux cd etc have their checks which eat up the fds.
Will do, thanks for the help so far!
Thanks for the info, I’m curious if this might be related to some version of systemd. If it continues to happen after the restart please share the output of: