helm: Errors on "helm list" AND "helm install"

Output of helm version: version.BuildInfo{Version:“v3.1.2”, GitCommit:“d878d4d45863e42fd5cff6743294a11d28a9abce”, GitTreeState:“clean”, GoVersion:“go1.13.8”}

Output of kubectl version: Client Version: version.Info{Major:“1”, Minor:“17”, GitVersion:“v1.17.2”} Server Version: version.Info{Major:“1”, Minor:“15+”, GitVersion:“v1.15.11-gke.9”}

Cloud Provider/Platform (AKS, GKE, Minikube etc.): GKE


On Helm list I get

Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR

On Helm install chart I get:

request.go:924] Unexpected error when reading response body: net/http: request canceled (Client.Timeout exceeded while reading body) Error: unable to build kubernetes objects from release manifest: unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout exceeded while reading body

“Helm delete” is working. Was able to uninstall a release


Additional notes:

  • there is network activity on “helm list” over one minute or so. (maybe timeout trigered?)

  • The setup was running for months without any problems

  • I did the update to Helm v3.1.2 in the this current debugging process for this issue

  • There was a Nodeupdate on the Kubernetes side recently - (maybe relevant)

  • Created also a new Cluster on GKE for testing and there “Helm list” …“Helm install” are working.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 18
  • Comments: 60 (6 by maintainers)

Commits related to this issue

Most upvoted comments

The quick workaround is to delete previous release versions. I had the same issue with prometheus-stack chart. so I list all secrets where helm3 saves data about releases. so I list secrets

kubectl get secrets --all-namespaces

found sh.helm.release.v1.kube-prometheus-stack.v7

and delete all previous version

kubectl delete secrets -n monitoring sh.helm.release.v1.kube-prometheus-stack.v1 ...

and helm ls start to work

I have similar issue with GKE and helm

> helm ls
Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR

any updates?

I have spent about 4 hours so far fixing this issue. Here’re the details:

Default Helm

$ helm version
version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"clean", GoVersion:"go1.14.9"}

Failure with the system default Helm

$ helm --kube-context ctx list --all --deployed --failed --date -n ns --max 1000
Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR

Code changes

$ gd
diff --git a/pkg/storage/driver/secrets.go b/pkg/storage/driver/secrets.go
index 2e8530d0..f3694cfc 100644
--- a/pkg/storage/driver/secrets.go
+++ b/pkg/storage/driver/secrets.go
@@ -35,8 +35,12 @@ import (
 
 var _ Driver = (*Secrets)(nil)
 
-// SecretsDriverName is the string name of the driver.
-const SecretsDriverName = "Secret"
+const (
+       // SecretsDriverName is the string name of the driver.
+       SecretsDriverName = "Secret"
+       // ListPaginationLimit is the number of Secrets we fetch in a single API call.
+       ListPaginationLimit = int64(300)
+)
 
 // Secrets is a wrapper around an implementation of a kubernetes
 // SecretsInterface.
@@ -78,15 +82,36 @@ func (secrets *Secrets) Get(key string) (*rspb.Release, error) {
 // List fetches all releases and returns the list releases such
 // that filter(release) == true. An error is returned if the
 // secret fails to retrieve the releases.
+// We read `ListPaginationLimit` Secrets at a time so as not to overwhelm the
+// `api-server` in a cluster with many releases; fixes
+// https://github.com/helm/helm/issues/7997
 func (secrets *Secrets) List(filter func(*rspb.Release) bool) ([]*rspb.Release, error) {
        lsel := kblabels.Set{"owner": "helm"}.AsSelector()
-       opts := metav1.ListOptions{LabelSelector: lsel.String()}
+       opts := metav1.ListOptions{LabelSelector: lsel.String(), Limit: ListPaginationLimit}
 
+       // Perform an initial list
        list, err := secrets.impl.List(context.Background(), opts)
        if err != nil {
                return nil, errors.Wrap(err, "list: failed to list")
        }
 
+       // Fetch more results from the server by making recursive paginated calls
+       isContinue := list.Continue
+       for isContinue != "" {
+               secrets.Log("list: fetched %d secrets, more to fetch..\n", ListPaginationLimit)
+               opts = metav1.ListOptions{LabelSelector: lsel.String(), Limit: ListPaginationLimit, Continue: isContinue}
+               batch, err := secrets.impl.List(context.Background(), opts)
+               if err != nil {
+                       return nil, errors.Wrap(err, "list: failed to perform paginated listing")
+               }
+
+               // Append the results to the initial list
+               list.Items = append(list.Items, batch.Items...)
+
+               isContinue = batch.Continue
+       }
+       secrets.Log("list: fetched %d releases\n", len(list.Items))
+
        var results []*rspb.Release
 
        // iterate over the secrets object list

Build custom Helm

$ make && stat bin/helm

Custom Helm with fix for listing

$ ./bin/helm version
version.BuildInfo{Version:"v3.8+unreleased", GitCommit:"65d8e72504652e624948f74acbba71c51ac2e342", GitTreeState:"dirty", GoVersion:"go1.17.2"}

Success with the custom Helm with the changes as above

$ ./bin/helm --debug --kube-context ctx list --all --deployed --failed --date -n ns --max 1000
secrets.go:101: [debug] list: fetched 300 secrets, more to fetch..

secrets.go:101: [debug] list: fetched 300 secrets, more to fetch..

secrets.go:101: [debug] list: fetched 300 secrets, more to fetch..

secrets.go:101: [debug] list: fetched 300 secrets, more to fetch..

secrets.go:101: [debug] list: fetched 300 secrets, more to fetch..

secrets.go:113: [debug] list: fetched 1621 releases
...
...
<list of releases in namespace `ns`>

Note: The in-built UTs are currently failing - I am yet to modify them. I will fix them or if someone can help me fix them asap, I can open a PR and get this ready for merge.

EDIT: The UTs are good now, PR out.

Увеличиваем таймаут --request-timeout=1m0s до 2m0s и все заработает! /etc/kubernetes/manifests/kube-apiserver.yaml

Yes, that’s the global time out, and I didn’t mention changing it as an option because the vast majority of users don’t have access to be changing flags on their cluster’s kube-apiserver.

No, there are no updates. The PR was opened and the contributor is no longer able to work on it. If someone wants to pick it up, please do.

I have commented here but I would avoid work on this till I have a reviewer/maintainer sign up for owning the merge process at the end; cannot bother waiting ~3 months to get PRs reviewed.

We have no updates on our side, as we don’t believe this is a Helm error so much as a Kubernetes control plane error. Right now, all of the complaints that I know of are specific to GCP. You might have better luck asking someone there about the issue.

To my knowledge EKS has never had this problem. I experienced it on AKS a year ago, and it has since been fixed. I know of no cases involving on-prem versions of Kubernetes.

So at this point, we believe the error to be specific to GKE’s internal control plane implementation.

This is something we experience in some EKS clusters. We use Helm Terraform provider and we have those kind of issues specially when the resources state grow The error looks like

Error: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 11; INTERNAL_ERROR

As workaround, we moved from secret to configmap as storage backend. Now, we still experience the issue but less often

Here is the output of helm --debug with more details

Error: list: failed to list: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR helm.go:84: [debug] stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 3; INTERNAL_ERROR list: failed to list helm.sh/helm/v3/pkg/storage/driver.(Secrets).List /private/tmp/helm--615sa8/src/helm.sh/helm/pkg/storage/driver/secrets.go:87 helm.sh/helm/v3/pkg/action.(List).Run /private/tmp/helm-/src/helm.sh/helm/pkg/action/list.go:154 main.newListCmd.func1 /private/tmp/helm-/src/helm.sh/helm/cmd/helm/list.go:80 github.com/spf13/cobra.(Command).execute /private/tmp/helm-/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 github.com/spf13/cobra.(Command).ExecuteC /private/tmp/helm--/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 github.com/spf13/cobra.(Command).Execute /private/tmp/helm--/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887 main.main /private/tmp/helm-/src/helm.sh/helm/cmd/helm/helm.go:83 runtime.main /usr/local/Cellar/go@1.13/1.13.10_1/libexec/src/runtime/proc.go:203 runtime.goexit /usr/local/Cellar/go@1.13/1.13.10_1/libexec/src/runtime/asm_amd64.s:1357

What helped me fix the issue was deleting the secrets that helm creates using a simple script NAMESPACE=monitor kubectl get secrets -n $NAMESPACE --no-headers=true | awk ‘/sh.helm.release/{print $1}’| xargs kubectl delete -n $NAMESPACE secrets

For our EKS setup, helm list can’t handle more than ~3000 versions/secrets.

Cleaning up old versions/secrets solved the issue (we had ~13 000). kubectl get secrets took only 10 seconds to list more than 13000 secrets, so I believe the issue is on the helm side.

LIST operations that take more than 60 seconds hit the global timeout and are terminated by the server. The error message combined with the “network traffic [being] 30Kb per second until it fails” makes me suspect that is what is happening, with the likely cause being a slow internet connection between the user and the control plane. A prior commenter suggested running the command from a pod in the cluster, I would try that.

any update on that? faced with the issue with helm version 3.9.4, so issue still exist

I’m seeing this as well, eventually after running a helm upgrade 20+ times, it succeeded. In my case, there isn’t anything I can do about the network speed, it’s over a satellite link. Is there any hope of getting a timeout option added for situations where there are a large number of secrets/versions or slow network links? It doesn’t look like the existing --timeout option covers that case.

@UmairHassanKhan Yes, just set the “–history-max” to some low value if you’re using helm cli or “max_history” if you’re using terraform helm provider. Setting this value to “10” helped me. Also, you can delete old history records manually with kubectl.

I am facing this issue when installing helm chart of prometheus did anyone resolved the issue?

We faced same issue in EKS with cert-manager. We tried to clean up the cluster by deleting some old helm releases which could delete old secrets. Ref: https://github.com/jetstack/cert-manager/issues/3229

“killing connection/stream because serving request timed out and response had been started”

This is proof that you’re hitting the 60s global time out I mentioned previously.

The client needs a faster network connection or you have to list less data. There’s no other solutions that don’t boil down to making it faster or doing less work.

“Does kubectl work? Can you create/update/delete resources that way?”

Yes, this is working