go: net/http: frequent HTTP2 INTERNAL_ERROR errors during module zip download since 2021-10-06

#!watchflakes
post <- log ~ `golang\.org/[^@]+@v\d+\.\d+\.\d+[a-z0-9-.]*:.* stream error: stream ID \d+; INTERNAL_ERROR; received from peer`

The builders are seeing a lot of INTERNAL_ERROR results since 2021-10-06, typically for google.golang.org/protobuf but sometimes for other golang.org paths as well.

This may be related to #50541, which saw an uptick in errors from go.googlesource.com in about the same timeframe.

CC @golang/release

../../../../pkg/mod/cloud.google.com/go/datastore@v1.2.0/client.go:25:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go@v0.88.0/internal/trace/trace.go:23:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/google.golang.org/grpc@v1.39.0/status/status.go:34:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go/datastore@v1.2.0/save.go:26:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go@v0.88.0/cloudbuild/apiv1/v2/cloud_build_client.go:33:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go@v0.88.0/longrunning/autogen/operations_client.go:31:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go@v0.88.0/iam/iam.go:30:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go/storage@v1.10.0/iam.go:24:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer

greplogs --dashboard -md -l -e 'golang\.org/[^@]+@v\d+\.\d+\.\d+[a-z0-9-.]*: stream error: stream ID \d+; INTERNAL_ERROR; received from peer' --since=2021-01-01

2022-02-19T16:23:54-3962a08-0261fa6/dragonfly-amd64 2022-02-10T22:36:44-59536be-0a6cf87/linux-amd64-wsl 2022-02-09T18:09:24-9862752-6a70ee2/darwin-arm64-11_0-toothrot 2022-02-09T16:29:28-dad3315-0a6cf87/darwin-arm64-12_0-toothrot 2022-02-09T16:29:28-095f870-0a6cf87/darwin-arm64-12_0-toothrot 2022-02-08T17:01:10-9b156ee-ef06a5f/plan9-amd64-0intro 2022-02-08T16:42:58-ac99473-e2277c8/darwin-arm64-12_0-toothrot 2022-02-03T19:55:40-cd36cc0-896df42/plan9-amd64-0intro 2022-01-25T22:56:45-c20fd7c-6eb58cd/plan9-amd64-0intro 2022-01-24T21:27:33-6944b10-671e115/plan9-amd64-0intro 2022-01-24T21:27:33-5e0467b-671e115/plan9-amd64-0intro 2022-01-14T16:41:18-4bd3f69-b41185c/plan9-386-0intro 2022-01-07T02:32:39-e83268e-2bb7f6b/linux-amd64-wsl 2022-01-05T22:15:55-43762dc-a845a56/darwin-arm64-12_0-toothrot 2022-01-03T23:45:12-be6af36-95b240b/plan9-amd64-0intro 2022-01-02T14:27:43-6944b10-c886143/plan9-amd64-0intro 2021-12-15T20:26:19-be6af36-07ed86c/darwin-arm64-12_0-toothrot 2021-12-14T18:18:45-ecdc095-becaeea/plan9-arm 2021-12-14T01:48:22-6944b10-1afa432/plan9-amd64-0intro 2021-11-10T19:35:55-03971e3-b954f58/darwin-arm64-11_0-toothrot 2021-10-28T14:25:03-103d89b-a3bb28e/darwin-amd64-10_15 2021-10-26T11:58:05-d418f37-1e2820a/openbsd-arm64-jsing 2021-10-14T15:16:55-a66eb64-ad99d88/linux-amd64-wsl 2021-10-14T01:51:22-e13a265-276fb27/windows-arm64-10 2021-10-13T00:11:47-281050f-4fb2e1c/darwin-amd64-11_0 2021-10-11T20:46:14-089bfa5-7023535/darwin-amd64-11_0 2021-10-07T20:37:43-86761ae-ef2ebbe/darwin-arm64-11_0-toothrot 2021-10-07T20:37:43-59d4e92-ef2ebbe/darwin-arm64-11_0-toothrot 2021-10-06T18:44:56-40a54f1-195945a/darwin-amd64-nocgo

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 1
  • Comments: 29 (13 by maintainers)

Commits related to this issue

Most upvoted comments

stream error: stream ID N; INTERNAL_ERROR; received from peer means an HTTP/2 stream (a request) was closed by the remote endpoint with the error code INTERNAL_ERROR.

There are various other error codes to tell the client that it is misbehaving, such as PROTOCOL_ERROR. INTERNAL_ERROR is supposed to indicate an internal error on the remote side.

So whatever is doing the module zip download is claiming that it’s unhealthy. It’s not inconceivable that this is somehow the result of a net/http client bug, but if it is the remote side is doing a very bad job of telling us about it.

Maybe if we can reproduce the failure with GODEBUG=http2debug=2 enabled we might see something that helps indicate the source of the problem, but at the moment my suspicion is that the problem lies on the remote end of the connection.

Is there any update on this issue! Experiencing too frequently

Our current belief is that the root cause is the GFE sending an INTERNAL_ERROR, probably due to the backend behind the GFE breaking the response stream.

This is not an net/http bug; net/http seems to be accurately reporting the error code sent by the GFE. (The error message could perhaps be clearer, but that’s a secondary issue.)

Whatever tools are using net/http to fetch large files should be more robust in the handling of stream errors. There are many reasons a large fetch might break mid-stream, and it’s unlikely we can entirely address this situation entirely by reducing the number of broken streams. This means we should add some form of retry to module zip downloads.

I think it’s okay to move this to the 1.20 milestone.

From logs, it’s also happened for github.com dependencies such as github.com/yuin/goldmark in 2022-02-08T16:42:58-ac99473-e2277c8/darwin-arm64-12_0-toothrot.

Downloading happens with GOPROXY=https://proxy.golang.org set, so perhaps the go.googlesource.com and golang.org servers aren’t relevant (they just happen to be where most of dependencies are coming from).

I’ve learned that INTERNAL_ERROR refers to an HTTP/2 error code that is described as “Implementation fault”, but it’s not clear to me if the server or client is at fault. CC @neild.