helm: Helm v2.16.4: nil pointer in kube.getSelectorFromObject after upgrading from v2.16.3

Upgrade deployment with Helm 2.16.4 fails:

# helm upgrade fluent-bit clusterrepo/fluent-bit --install --atomic --version 0.4.7 --namespace log -f /etc/kubernetes/helm-values-fluent-bit.yaml 
Release "fluent-bit" has been upgraded.
Error: transport is closing

The reason for the closed connection is a segmentation violation in Tiller:

[storage] 2020/03/25 13:14:38 listing all releases with filter
[storage] 2020/03/25 13:14:38 listing all releases with filter
[storage] 2020/03/25 13:14:40 listing all releases with filter
[storage] 2020/03/25 13:14:42 listing all releases with filter
[tiller] 2020/03/25 13:14:44 getting history for release fluent-bit
[storage] 2020/03/25 13:14:44 getting release history for "fluent-bit"
[tiller] 2020/03/25 13:14:44 preparing update for fluent-bit
[storage] 2020/03/25 13:14:44 getting deployed releases from "fluent-bit" history
[storage] 2020/03/25 13:14:44 getting last revision of "fluent-bit"
[storage] 2020/03/25 13:14:44 getting release history for "fluent-bit"
[tiller] 2020/03/25 13:14:44 rendering fluent-bit chart using values
2020/03/25 13:14:44 info: manifest "fluent-bit/templates/psp.yaml" is empty. Skipping.
[tiller] 2020/03/25 13:14:45 creating updated release for fluent-bit
[storage] 2020/03/25 13:14:45 creating release "fluent-bit.v12"
[storage] 2020/03/25 13:14:45 getting release history for "fluent-bit"
[storage] 2020/03/25 13:14:45 getting deployed releases from "fluent-bit" history
[storage] 2020/03/25 13:14:45 deleting release "fluent-bit.v2"
[storage] 2020/03/25 13:14:45 Pruned 1 record(s) from fluent-bit with 0 error(s)
[tiller] 2020/03/25 13:14:45 performing update for fluent-bit
[tiller] 2020/03/25 13:14:45 executing 1 pre-upgrade hooks for fluent-bit
[tiller] 2020/03/25 13:14:45 hooks complete for pre-upgrade fluent-bit
[kube] 2020/03/25 13:14:45 building resources from updated manifest
[kube] 2020/03/25 13:14:45 checking 5 resources for changes
[kube] 2020/03/25 13:14:45 Looks like there are no changes for ConfigMap "fluent-bit-config"
[kube] 2020/03/25 13:14:45 Looks like there are no changes for ServiceAccount "fluent-bit"
[kube] 2020/03/25 13:14:45 Looks like there are no changes for ClusterRole "fluent-bit"
[kube] 2020/03/25 13:14:45 Looks like there are no changes for ClusterRoleBinding "fluent-bit"
[kube] 2020/03/25 13:14:45 Looks like there are no changes for DaemonSet "fluent-bit"
[kube] 2020/03/25 13:14:45 beginning wait for 5 resources with timeout of 5m0s
[tiller] 2020/03/25 13:14:47 executing 1 post-upgrade hooks for fluent-bit
[tiller] 2020/03/25 13:14:47 hooks complete for post-upgrade fluent-bit
[storage] 2020/03/25 13:14:47 updating release "fluent-bit.v11"
[tiller] 2020/03/25 13:14:47 updating status for updated release for fluent-bit
[storage] 2020/03/25 13:14:47 updating release "fluent-bit.v12"
[storage] 2020/03/25 13:14:47 getting last revision of "fluent-bit"
[storage] 2020/03/25 13:14:47 getting release history for "fluent-bit"
[kube] 2020/03/25 13:14:47 Doing get for ConfigMap: "fluent-bit-config"
[kube] 2020/03/25 13:14:47 Doing get for ServiceAccount: "fluent-bit"
[kube] 2020/03/25 13:14:47 Doing get for ClusterRole: "fluent-bit"
[kube] 2020/03/25 13:14:47 Doing get for ClusterRoleBinding: "fluent-bit"
[kube] 2020/03/25 13:14:47 Doing get for DaemonSet: "fluent-bit"
[kube] 2020/03/25 13:14:47 get relation pod of object: log/ConfigMap/fluent-bit-config
[kube] 2020/03/25 13:14:47 get relation pod of object: log/ServiceAccount/fluent-bit
[kube] 2020/03/25 13:14:47 get relation pod of object: /ClusterRole/fluent-bit
[kube] 2020/03/25 13:14:47 get relation pod of object: /ClusterRoleBinding/fluent-bit
[kube] 2020/03/25 13:14:47 get relation pod of object: log/DaemonSet/fluent-bit
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14e315f]

goroutine 721 [running]:
k8s.io/helm/pkg/kube.getSelectorFromObject(0x1c01c60, 0xc00015b200, 0xc00015b200, 0x0)
        /go/src/k8s.io/helm/pkg/kube/client.go:925 +0x26f
k8s.io/helm/pkg/kube.(*Client).getSelectRelationPod(0xc000442e60, 0xc0004aa9a0, 0xc0008d0f90, 0x14e1368, 0xc000000020, 0x1a666f0)
        /go/src/k8s.io/helm/pkg/kube/client.go:1105 +0x195
k8s.io/helm/pkg/kube.(*Client).Get.func2(0xc0004aa9a0, 0x0, 0x0)
        /go/src/k8s.io/helm/pkg/kube/client.go:367 +0xd1
k8s.io/helm/pkg/kube.batchPerform.func1(0xc000304360, 0xc0007857d0, 0xc000897540, 0xc0004aa9a0)
        /go/src/k8s.io/helm/pkg/kube/client.go:753 +0x30
created by k8s.io/helm/pkg/kube.batchPerform
        /go/src/k8s.io/helm/pkg/kube/client.go:752 +0xb8

The error persists when forcing the Tiller Pod to another node so it looks like an application issue. After downgrading Helm CLI and Tiller to v2.16.3 the deployment upgrade succeeds without any issues. The error returns after upgrading to v2.16.4 again. This issue occured on one of two clusters only at the moment and as far as I can tell right now only one deployment is affected (which is enough to let the deployment pipeline fail…).

There are no resource limits configured for Tiller and the Linux kernel does not show any OOM.

Output of helm version:

Client: &version.Version{SemVer:"v2.16.4", GitCommit:"5e135cc465d4231d9bfe2c5a43fd2978ef527e83", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.16.4", GitCommit:"5e135cc465d4231d9bfe2c5a43fd2978ef527e83", GitTreeState:"clean"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.4", GitCommit:"67d2fcf276fcd9cf743ad4be9a9ef5828adc082f", GitTreeState:"clean", BuildDate:"2019-09-18T14:51:13Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.4", GitCommit:"67d2fcf276fcd9cf743ad4be9a9ef5828adc082f", GitTreeState:"clean", BuildDate:"2019-09-18T14:41:55Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.): kubeadm

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 5
  • Comments: 27 (10 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve just hit this with 2.16.5, downgrading to 2.16.3 and everything is fine. In my case, I’m running Helm as part of an automation script to install / upgrade a bunch of stuff in my cluster. I get to upgrading Istio and it fails (after succeeding with 2 other apps):

Release "istio-init" has been upgraded.
Error: transport is closing

Both client and Tiller were running 2.16.5, connecting to an EKS cluster running 1.15.10.

#7840 has been merged. Closing!

That would be great! Please keep us posted on what you discover. We’ve collectively spent many hours trying to reproduce, track down and fix this, and we’re just flummoxed.

Update: I also ran this against 2.16.5 to see if the error was there, and could not reproduce with that version of Tiller, either.