kubernetes: 1000+ services in the same namespace will degrade pod start time & eventually prevent pods from starting

From: https://github.com/kubernetes/kubernetes/issues/92226

What happened:

Having a modest number of services in your namespace will eventually degrade pod start time and after a certain number of services pods will fail to start.

What you expected to happen:

The number of services in a namespace should not prevent a pod from starting or degrade its start time.

How to reproduce it (as minimally and precisely as possible):

Example here: https://github.com/knative/serving/issues/8498

Anything else we need to know?:

The culprit is the large list of env vars on pods coming from service links (which are on by default)

Environment:

  • Kubernetes version (use kubectl version): 1.16
  • Cloud provider or hardware configuration: GKE

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 3
  • Comments: 26 (17 by maintainers)

Most upvoted comments

I had this issue recently on a customer cluster. There where too many services in a single namespace, which caused a bunch of env vars appearing on every pod. The problem might not be immediately apparent, but try running a timed workload (e.g. a bash script that performs sqrt() ) in that namespace and the same workload in a different namespace. For me, the difference was astronomical. Reason might be that any child process created needs to inherit all these env vars, and therefore, any type of workload that creates many child processes will suffer. My fix for this issue was to add spec.template.spec.enableServiceLinks: false to the deployment since the pods did not depend on these environmental variables to function properly.

@dprotaso can you please check if enableServiceLinks: false helps.

We had a lot of services in the same namespace. MariaDB startup script was printing 1 line of text per 3sec or so. apt-get install -y php crashed the container - extreme slowness. The simplest way to produce the problem in the container was: time bash -c 'echo | cat'. This command took somewhere between 4-14 sec. We found that the test command had to be bash and contain pipe | (forking subshell?). Single commands were OK. Other shells were OK too.

enableServiceLinks: false helped. When this is set to false, a DNS service like CoreDNS or KubeDNS needs to be installed if name resolution is needed.