cluster-api: Provide ability to customize the timeout for KCP when talking to a workload etcd cluster

What steps did you take and what happened: [A clear and concise description on how to REPRODUCE the bug.]

https://github.com/kubernetes-sigs/cluster-api/blob/7e42829e0ccd15d0960ad52cf2a9616cb7c6a72d/controlplane/kubeadm/internal/etcd/etcd.go#L33

Tried to bootstrap a control plane in Singapore, from my local workstation in the US. Ping latency is about 250ms. KCP repeatedly fails to create a functional etcd client. Internal code uses a hardcoded timeout of 2s, w/ no way to override. Bumping this timeout in a forked version of the code resolved the problem.

What did you expect to happen:

Expected cluster spinup to be successful over high latency network.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api version: 0.3.19 (forked)
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (18 by maintainers)

Most upvoted comments

FWIW, in our fork we went w/ using the default of 2s unless there’s an envvar defined to override.

+1 from my side @timoreimann let me know if I can help to get this addressed (here some hints about point 2 of Vince’s suggestion)

I could take a stab at a PR if we have consensus and nobody is on it already.

@timoreimann Let’s do it!

If that’s ok with everyone I’d suggest to:

  • Raise the current default timeout to at least 10 (or more?) seconds, given that Cluster API is most often running in a remote location and potentially connecting to workload clusters across the globe.
  • Provide a flag similar to the one of the bootstrap token in CABPK.

cc @randomvariable @timothysc