kubernetes: `pull-kubernetes-kubemark-e2e-gce-big` always fails

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

pull-kubernetes-kubemark-e2e-gce-big always fails

All error info like: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/63901/pull-kubernetes-kubemark-e2e-gce-big/8390/

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 24 (24 by maintainers)

Commits related to this issue

Most upvoted comments

Are the kubelet ran on the same node? Given that fsnotify needs to keep at least one file opened for each path / file it watches if the 500 node test runs multiple kubelets on the same machine it would seem reasonable that we are hitting that limit much earlier than before.

We run mutliple “hollow-kubelets” on the same node - yes.A simple fix would be to raise that limit or have less kubelet per node.nb

A simple fix would be to raise that limit or have less kubelet per node.

Less kubelets per node is not an option - we do ran many of them to reduce costs. Raising a limit might be a good fix.

Yeah, the hollow nodes look to be throwing errors (https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/batch/pull-kubernetes-kubemark-e2e-gce-big/9164/artifacts/e2e-batch-9164-ac87c-minion-group-b2hg/kubelet-hollow-node-26mfd.log):

E0530 17:05:21.878485      11 kubelet_network.go:102] Failed to ensure that nat chain KUBE-MARK-DROP exists: error creating chain "KUBE-MARK-DROP": executable file not found in $PATH: 
E0530 17:05:21.902981      11 eviction_manager.go:271] eviction manager: failed to get get summary stats: failed to get root cgroup stats: failed to get cgroup stats for "/": unexpected number of containers: 0
E0530 17:05:31.903141      11 eviction_manager.go:271] eviction manager: failed to get get summary stats: failed to get root cgroup stats: failed to get cgroup stats for "/": unexpected number of containers: 0
E0530 17:05:41.903292      11 eviction_manager.go:271] eviction manager: failed to get get summary stats: failed to get root cgroup stats: failed to get cgroup stats for "/": unexpected number of containers: 0
E0530 17:05:51.903467      11 eviction_manager.go:271] eviction manager: failed to get get summary stats: failed to get root cgroup stats: failed to get cgroup stats for "/": unexpected number of containers: 0
E0530 17:06:01.903629      11 eviction_manager.go:271] eviction manager: failed to get get summary stats: failed to get root cgroup stats: failed to get cgroup stats for "/": unexpected number of containers: 0
E0530 17:06:11.903796      11 eviction_manager.go:271] eviction manager: failed to get get summary stats: failed to get root cgroup stats: failed to get cgroup stats for "/": unexpected number of containers: 0

cc @shyamjvs Timeout waiting for all hollow-nodes to become Running.