helm: Helm Installs Don't Honor Timeout

I have tried to install Spinnaker via Helm many times. Most fail. A lot of them with this error:

$ helm install stable/spinnaker --namespace spinnaker --wait --timeout 1500
E0224 13:38:51.978584   19536 portforward.go:175] lost connection to pod
Error: transport is closing

Note that they fail in just a couple of minutes, not the 25 minutes specified. Shouldn’t they be honoring the specified timeout?

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 12
  • Comments: 41 (17 by maintainers)

Commits related to this issue

Most upvoted comments

I think it’s inevitable to have such kind of problems when doing long-running HTTP queries. It’s normal for load balancers to kill idle connections. For example, if kube-apiserver is sitting behind ELB, consider increasing idle_timeout.

I think a proper solution for this problem is to switch to polling. I.e. wait a few minutes for resources to be ready, close the connection and set-up a timer to poll tiller periodically. Disadvantage of this approach - helm will have to establish a new connection to tiller every few seconds but that’s certainly better than timing out.

@bacongobbler Yes, checked that. We have one node where helm is installed and tiller runs on our k8 cluster nodes (3 nodes). we do all helm install from this server which talks to k8 cluster. Not sure if our proxy is blocking the traffic or timing out. Let me check on that. Thanks.

any progress?

I’m having all kinds of timeout, both on tiller and on install (transport is closing as well), using helm 2.9.0. It’s really annoying, there’s azure LB involved, but I already set the idle timeout to 10 minutes.

Even using --wait --timeout 600 and --tiller-connection-timeout 600 doesn’t seem to fix the problem.

try with helm 3.0 😉

From what I can tell, the timeout is only not honored when attempting to do things remotely. If I log into one of my kubernetes nodes and issue the commands from there, it will sit there until the operation completes fully, whereas if I try to do it remotely it times out fairly quickly… this lends credence to the theory that the issue lies with whatever loadbalancer is in front of kubernetes.

as far as I understand, nobody in the community is currently looking into this particular issue. If you determine what the underlying issue is, we’d appreciate a patch!