kubernetes: Unable to detect if a watch is active

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

We are unable to detect if a watch is active or not. ie. Especially in http2 case.

If we have 3 kube-apiservers behind a LB, and currently kube-proxy cm multiplexing one connection in http2 schema. connected with one apiserver. When the apiserver gets stuck, there will be logs indicating the errors like get timeout. But the kube-proxy and kube-controller-manager will not reconnect to another apiserver.

What you expected to happen:

If one server stuck, client should be able to detect and reconnect to another server. At worst client should be able to log obvious error and exit, wait for other guards to restart it.

How to reproduce it (as minimally and precisely as possible):

send SIGSTOP signal to one apiserver, watch related kubelet logs.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 3
  • Comments: 51 (37 by maintainers)

Most upvoted comments

We’re waiting for go1.15. @caesarxuchao made a fix for this in the upstream go std library. Unfortunately, it’s basically impossible to fix this without that change.

This is already tracked here: https://github.com/kubernetes/client-go/issues/374#issuecomment-632457187

Apparently we’ve gone 2 years without de-duping these two issues, oops!