k3s: Too many open files

Version: k3s version v0.10.2 (8833bfd9)

Describe the bug

After running for +/- 3 days the host becomes unavailable. To many open files. Now running for 1 day I have 1250597 open inodes.

$ lsof
...
container  5747 25760 container             root *040u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *041u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *042u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *043u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *044u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *045u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *046u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *047u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *048u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *049u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *050u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *051u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *052u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *053u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *054u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *055u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *056u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *057u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *058u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *059u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *060u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *061u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *062u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *063u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *064u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *065u  a_inode               0,13            0       9259 [eventfd]
container  5747 25760 container             root *066u  a_inode               0,13            0       9259 [eventfd]
...

To Reproduce Run pods not sure how to reproduce. It started to appear from version upgrade 0.9 to 0.10

Expected behavior

Actual behavior

Host runs out nodes.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 24 (9 by maintainers)

Most upvoted comments

It’s a blocker problem - issue needs labeling and prioritizing please

This is not related to Longhorn. I’m not running it, and I have the exact same problem.

And culprit is obviously liveness/readiness probes with exec. At first, I had to reboot my servers once around three days. Then I took the liveness probes out of my own software. Now I get about a week before reboot. Still Calico, ingress-nginx, flux cd etc have their checks which eat up the fds.

Will do, thanks for the help so far!

Thanks for the info, I’m curious if this might be related to some version of systemd. If it continues to happen after the restart please share the output of:

lsof | sed 's/^.* //' | sort | uniq -c | sort -r -n | head