argo-events: too many open files - Error

Describe the bug When a Sensor starts it always throws the following error and then exits with ExitCode 1

{"level":"info","ts":1648741781.4416764,"logger":"argo-events.sensor","caller":"cmd/start.go:73","msg":"starting sensor server","sensorName":"kafka","version":"v1.6.0"}                                          
{"level":"info","ts":1648741781.4422603,"logger":"argo-events.sensor","caller":"metrics/metrics.go:172","msg":"starting metrics server","sensorName":"kafka"}                                                     
2022/03/31 15:49:41 too many open files   

Unfortunately there are no additional information. I already looked at the nodes where it’s running on for an exhaustion of the file descriptors, but everything is looking good there.

When the Sensor runs on a fresh node though it’s working fine. But we can’t always start fresh nodes and the affected nodes are fine regarding overall resource utilization.

To Reproduce Steps to reproduce the behavior:

  1. Start any sensor with a kafka EventSource (I did not test if it also happens with other sources)

Expected behavior It starts up normally.

Environment (please complete the following information):

  • Kubernetes: [e.g. v1.19.15-eks-9c63c4]
  • Argo: v3.2.9
  • Argo Events: 1.6.0

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 7
  • Comments: 23 (9 by maintainers)

Most upvoted comments

The issue still exists for us. We’re running argo-events 1.8.0, argo-cd 2.6.7, Promtail helm version 6.11.3 app version 2.8.2. The nodes look healthy and the issues appear randomly in sensors and/or event sources.

{"level":"info","ts":1686750556.961388,"logger":"argo-events.eventsource","caller":"cmd/start.go:63","msg":"starting eventsource server","eventSourceName":"web-apps","version":"v1.8.0"}
2
{"level":"info","ts":1686750556.9620998,"logger":"argo-events.eventsource","caller":"metrics/metrics.go:175","msg":"starting metrics server","eventSourceName":"web-apps"}
1
2023/06/14 13:49:16 too many open files

From the node I get:

/ # cat /proc/sys/fs/file-max
9223372036854775807
/ # lsof |wc -l
5410
/ # lsof |wc -l
5411
/ # lsof |wc -l
5412
/ # lsof |wc -l
5410

The issue appeared some weeks ago out of the sudden and now randomly comes and goes.

We found the culprit. The privileged promtail pods (<= v3.0.3) are setting the fs.inotify.max_user_instances to 128. We’ll upgrade to at least v3.0.4 and then that error should be gone. Closed for now.

Thank you for your patience with us 😃