kubernetes: kubelet don't share transport by default after #95427

What happened:

In #95427, client-go don’t share transport when c.Dial is not nil, https://github.com/kubernetes/kubernetes/blob/b0abe89ae259d5e891887414cb0e5f81c969c697/staging/src/k8s.io/client-go/transport/cache.go#L136 but kubelet custorm Dial by default, https://github.com/kubernetes/kubernetes/blob/b0abe89ae259d5e891887414cb0e5f81c969c697/cmd/kubelet/app/server.go#L929

After PR is integrated, the connections between kubelet and kube-apiserver in our cluster with 4000 nodes increases by five times.

What you expected to happen:

kubelet share transport by default, one kubelet only keep one connection to kube-apiserver. @liggitt

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

start kubelet with rotate-certificates=false

Environment:

  • Kubernetes version (use kubectl version): v1.19.4
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):EulerOS 2.9
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (17 by maintainers)

Most upvoted comments

@liggitt

Some increase was expected, since multiple REST clients constructed from a config with a custom dialer cannot safely share a transport, and client-go constructs a REST client for each API group/version accessed. Something like #97821 would be required to rework how client-go constructs clients to start sharing REST clients above the transport level between API groups/versions.

I think this is working as intended until client-go client construction is reworked.

Hi @liggitt, if rotate-certificates=false kubelet customizes the dialer to provide a closeAllConns to close the connection when the connection is dead but not been closed.

In #78016, there is a discuss about another solution

As discussed in kubernetes/client-go#374, another way to fix this is using http/2.0’s ping frame to keep connection alive and identify failed connections. but it seems to be a long-term solution.

after #96778 client-go has HTTP 2.0 health check, so we don’t need this change.

So we can solve the current problem by reverting #78016.

I have submitted #103149 to revert #78016, and this change requires your final confirmation.

Thanks you.

@gjkim42 I will work on this issue