helm: Errors on "helm list" AND "helm install"
Output of helm version:
version.BuildInfo{Version:“v3.1.2”, GitCommit:“d878d4d45863e42fd5cff6743294a11d28a9abce”, GitTreeState:“clean”, GoVersion:“go1.13.8”}
Output of kubectl version:
Client Version: version.Info{Major:“1”, Minor:“17”, GitVersion:“v1.17.2”}
Server Version: version.Info{Major:“1”, Minor:“15+”, GitVersion:“v1.15.11-gke.9”}
Cloud Provider/Platform (AKS, GKE, Minikube etc.): GKE
On Helm list I get
Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR
On Helm install chart I get:
request.go:924] Unexpected error when reading response body: net/http: request canceled (Client.Timeout exceeded while reading body) Error: unable to build kubernetes objects from release manifest: unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout exceeded while reading body
“Helm delete” is working. Was able to uninstall a release
Additional notes:
-
there is network activity on “helm list” over one minute or so. (maybe timeout trigered?)
-
The setup was running for months without any problems
-
I did the update to Helm v3.1.2 in the this current debugging process for this issue
-
There was a Nodeupdate on the Kubernetes side recently - (maybe relevant)
-
Created also a new Cluster on GKE for testing and there “Helm list” …“Helm install” are working.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 18
- Comments: 60 (6 by maintainers)
The quick workaround is to delete previous release versions. I had the same issue with prometheus-stack chart. so I list all secrets where helm3 saves data about releases. so I list secrets
found
sh.helm.release.v1.kube-prometheus-stack.v7and delete all previous version
and
helm lsstart to workI have similar issue with GKE and helm
any updates?
I have spent about 4 hours so far fixing this issue. Here’re the details:
Default Helm
Failure with the system default Helm
Code changes
Build custom Helm
Custom Helm with fix for listing
Success with the custom Helm with the changes as above
Note: The in-built UTs are currently failing - I am yet to modify them. I will fix them or if someone can help me fix them asap, I can open a PR and get this ready for merge.
EDIT: The UTs are good now, PR out.
Увеличиваем таймаут --request-timeout=1m0s до 2m0s и все заработает! /etc/kubernetes/manifests/kube-apiserver.yaml
Yes, that’s the global time out, and I didn’t mention changing it as an option because the vast majority of users don’t have access to be changing flags on their cluster’s kube-apiserver.
I have commented here but I would avoid work on this till I have a reviewer/maintainer sign up for owning the merge process at the end; cannot bother waiting ~3 months to get PRs reviewed.
This is something we experience in some EKS clusters. We use Helm Terraform provider and we have those kind of issues specially when the resources state grow The error looks like
As workaround, we moved from secret to configmap as storage backend. Now, we still experience the issue but less often
Here is the output of helm --debug with more details
Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR helm.go:84: [debug] stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR list: failed to list helm.sh/helm/v3/pkg/storage/driver.(Secrets).List /private/tmp/helm--615sa8/src/helm.sh/helm/pkg/storage/driver/secrets.go:87 helm.sh/helm/v3/pkg/action.(List).Run /private/tmp/helm-/src/helm.sh/helm/pkg/action/list.go:154 main.newListCmd.func1 /private/tmp/helm-/src/helm.sh/helm/cmd/helm/list.go:80 github.com/spf13/cobra.(Command).execute /private/tmp/helm-/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 github.com/spf13/cobra.(Command).ExecuteC /private/tmp/helm--/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 github.com/spf13/cobra.(Command).Execute /private/tmp/helm--/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887 main.main /private/tmp/helm-/src/helm.sh/helm/cmd/helm/helm.go:83 runtime.main /usr/local/Cellar/go@1.13/1.13.10_1/libexec/src/runtime/proc.go:203 runtime.goexit /usr/local/Cellar/go@1.13/1.13.10_1/libexec/src/runtime/asm_amd64.s:1357
What helped me fix the issue was deleting the secrets that helm creates using a simple script NAMESPACE=monitor kubectl get secrets -n $NAMESPACE --no-headers=true | awk ‘/sh.helm.release/{print $1}’| xargs kubectl delete -n $NAMESPACE secrets
For our EKS setup,
helm listcan’t handle more than ~3000 versions/secrets.Cleaning up old versions/secrets solved the issue (we had ~13 000).
kubectl get secretstook only 10 seconds to list more than 13000 secrets, so I believe the issue is on the helm side.LIST operations that take more than 60 seconds hit the global timeout and are terminated by the server. The error message combined with the “network traffic [being] 30Kb per second until it fails” makes me suspect that is what is happening, with the likely cause being a slow internet connection between the user and the control plane. A prior commenter suggested running the command from a pod in the cluster, I would try that.
any update on that? faced with the issue with helm version 3.9.4, so issue still exist
I’m seeing this as well, eventually after running a
helm upgrade20+ times, it succeeded. In my case, there isn’t anything I can do about the network speed, it’s over a satellite link. Is there any hope of getting a timeout option added for situations where there are a large number of secrets/versions or slow network links? It doesn’t look like the existing--timeoutoption covers that case.@UmairHassanKhan Yes, just set the “–history-max” to some low value if you’re using helm cli or “max_history” if you’re using terraform helm provider. Setting this value to “10” helped me. Also, you can delete old history records manually with kubectl.
I am facing this issue when installing helm chart of prometheus did anyone resolved the issue?
We faced same issue in EKS with cert-manager. We tried to clean up the cluster by deleting some old helm releases which could delete old secrets. Ref: https://github.com/jetstack/cert-manager/issues/3229
This is proof that you’re hitting the 60s global time out I mentioned previously.
The client needs a faster network connection or you have to list less data. There’s no other solutions that don’t boil down to making it faster or doing less work.
“Does kubectl work? Can you create/update/delete resources that way?”
Yes, this is working