rancher: Timeout issues with helm after upgrading to 2.2.4-rc2

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

  • Upgrade to Rancher 2.2.4-rc2
  • Try to install or upgrade a helm chart

Result:

Failed to install app cluster-monitoring. Error: UPGRADE FAILED: Get https://localhost:443/k8s/clusters/c-9nd4d/api/v1/namespaces/cattle-prometheus/configmaps?labelSelector=NAME%!D(MISSING)cluster-monitoring%!C(MISSING)OWNER%!D(MISSING)TILLER%!C(MISSING)STATUS%!D(MISSING)DEPLOYED: dial tcp 10.0.15.1:443: i/o timeout

Other details that may be helpful:

  • 2.2.3-rc9 fails also
  • If I switch back to 2.2.3-rc4, 2.2.3-rc5, 2.2.3-rc6, 2.2.3-rc7 or 2.2.3-rc8 it works as expected

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.2.4-rc2
  • Installation option (single install/HA): single docker install

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (9 by maintainers)

Most upvoted comments

This is caused by a change we made in v2.2.4 in which we run helm inside of a chroot for enhanced security. The problem is that in constructing that chroot, we are not copying in the /etc/nswitch.conf file. This file controls how hostnames are resolved and it says "first look at /etc/hosts (where localhost is defined), then look at remote nameservers). Without this file in the chroot jail, /etc/hosts is not being considered, so it is resolved by a DNS server. In this case, that DNS server actually has an entry for localhost that is resolving to some random IP (In theory, it really shouldn’t have that entry, but that is out of rancher’s control).

We have fixed in the above PR by adding /etc/nsswitch.conf to the jailer script.

This fix will be in the next release, but until then, you can workaround it by modifying your dns server to return 127.0.0.1 for localhost or by adding this line to /usr/bin/jailer.sh

cp /etc/nsswitch.conf /opt/jail/$NAME/etc/

after the line

cp /etc/hosts /opt/jail/$NAME/etc/

You also could test the fix using the rancher/rancher:release-v2.2, but be warned: that image is cut from the bleeding edge of our release branch and is updated every time a commit is made to that branch. It is what we QA against, so it will always have untested changes in it.