kubernetes: kubectl does not retry after TLS handshake timeout

What happened:

One of our three control plane IPs is unresponsive. On my local machine, what I observe is sporadically it will lag for about 10 seconds, but otherwise works fine. This is because the Go standard library divides the 30 second dial timeout over the 3 IPs, and when the first times out it falls back to the second one.

Further testing shows that if the entire TCP dial times out, then kubectl itself will retry.

However, our build server is behind a firewall. Because of this, what happens there is the TCP dial works but the TLS handshake times out after 10 seconds. When this happens, kubectl treats it as fatal and does not attempt to retry.

What you expected to happen:

kubectl should retry if the TLS handshake times out. (It should start over with a fresh TCP dial.)

How to reproduce it (as minimally and precisely as possible):

I don’t know how to force this issue to reproduce.

Anything else we need to know?:

Environment:

Kubernetes client and server versions (use kubectl version): v1.21.13 (client), v1.22.12 (server)
Cloud provider or hardware configuration: AWS EKS
OS (e.g: cat /etc/os-release): macOS 12.5.1

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 4
Comments: 22 (13 by maintainers)

Commits related to this issue

Skip non-networking tests for AKS Windows jobs These tests flake due to issue https://github.com/kubernetes/kubectl/issues/1270. Skip the tests until the upstream issue is fixed. Signed-off-by: Ionu... — committed to ionutbalutoiu/k8s-e2e-runner by ionutbalutoiu a year ago
Skip non-networking tests for AKS Windows jobs These tests flake due to: * https://github.com/kubernetes/kubernetes/issues/114934 * https://github.com/kubernetes/kubectl/issues/1270 Skip the tests u... — committed to ionutbalutoiu/k8s-e2e-runner by ionutbalutoiu a year ago

Most upvoted comments

@brianpursley: Those labels are not set on the issue: triage/(so, triage/it, triage/can, triage/be, triage/re-triaged, triage/by, triage/api, triage/machinery)

In response to this:

/remove-sig cli /remove-triage accepted (so it can be re-triaged by API Machinery)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on Jul 19, 2023

Would be a very good feature. I’m getting frequent TLS timeouts from my k8s operator.

magnus-longva-bouvet on Dec 22, 2022