go: x/net/http2: make Transport return nicer error when Amazon ALB hangs up mid-response?

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

$ go version go version go1.8rc1 darwin/amd64

What operating system and processor architecture are you using (go env)?

Linux AMD64

What did you do?

If possible, provide a recipe for reproducing the error. A complete runnable program is good. A link on play.golang.org is best.

We have http client code that has started to return errors when the corresponding server uses HTTP2 instead of HTTP.

What did you expect to see?

Identical behavior.

What did you see instead?

http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug=""

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 3
  • Comments: 64 (25 by maintainers)

Commits related to this issue

Most upvoted comments

@jeffbarr, thanks for the connection! Three of us hopped on a call the other day and were able to repro the issue on demand.

For the record, the tool we used for debugging was https://github.com/bradfitz/h2slam pointed at an ALB, and then changed certain ALB parameters on the AWS control plane and the TCP connection from AWS would fail (in up to 10 seconds), often without even a GOAWAY.

I’ll let AWS folk give further updates here.

We’ve seen this “goaway” error quite commonly from our golang http/2 clients against our ALB backends and paid it no mind thinking it was just a “do a retry” error. But we noticed more recently that it seems like a deeper problem, especially now go’s http client handles the retry cases. It seems to be as @bradfitz suggests, the ALB is correctly sending a GOAWAY, and the go http client is noticing, but then the ALB closes the connection before closing the in-flight http/2 stream by omitting an end_stream flag on the final frame — an “unexpected connection close” error. It also looks like the stream is completed, and the upstream has fully processed a request and response, it’s just omitting the flag in the http/2 stream. This breaks the expected contract of a load balancer imho. I’ve chimed in on the aws forums issue already linked.

@bradfitz perhaps in this case the go http client needs to return an “unexpected connection close before stream end” error or something, instead of blaming the goaway?

I searched in our logs for GOAWAY and found a couple of thousand hits with the following message: Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=17611, ErrCode=NO_ERROR, debug=""

All hits have ErrCode=NO_ERROR afaict.

This seems harmless, can this be an info message instead of error?

Tune it how? This thread seems to come down to a bunch of finger pointing and no movement, aws claims you are broken, you claim they are broken, I don’t see a single comment that claims some magical tuning method for the alb. The fact that it works for some of our users and not others (consistently per client, I’m gathering information to see if I can correlate it to kubectl versions and/or the version of go that was used to compile it ) seems to be pointing to some kind of movement inside this library or potentially in how kubectl uses this library.

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html

The load balancer sends a response code of 000

With HTTP/2 connections, if the compressed length of any of the headers exceeds 8K bytes or if the number of requests served through one connection exceeds 10,000, the load balancer sends a GOAWAY frame and closes the connection with a TCP FIN.

@froodian, nope, what’s happening is:

  • Go user code: “res, err := http.Transport.RoundTrip(someReq) …”
  • Go net/http: “Here’s HTTP request 3 on this TCP connection”
  • ALB: “I’m gracefully shutting down, and the last request I’ll handle is 3” (GOAWAY LastStreamID=3)
  • ALB: “Here are the the HTTP response headers for request 3.”
  • Go net/http: “Great, thanks for the response headers for request 3, I’ll give that to the user now.”
  • Go user code: “res, err := http.Transport.RoundTrip(someReq) completes; err == nil, res.Body non-nil & read to read, res.ContentLength set to -1 or some value…”
  • ALB: “BYE BYE TCP CONNECTION DEAD.”
  • Go user code: “res.Body.Read(…)”
  • Go net/http: “Sorry, no Body to read. TCP connection is dead.”

If ALB hadn’t sent a response header then we could retry the request, but it’s pretty weird for us to retry the request when we’ve already given the response headers to the user code. The only safe thing to do is retry the request and hope for exactly the “same” response headers and only if they “match”, then continue acting like the original res.Body (which the Go user code is already reading from) is part of the second retried request.

But things like the server’s Date header probably changed, so that at least needs to be ignored. What else?

What if ALB had already returned some of the response body bytes, but not all? Do we need to keep a checksum of bytes read and stitch together the two bodies if the body is same length and second response response’s prefix bytes have the same checksum?

That would all be super sketchy.

It’s better to just return an error, which we do. If the caller wants to retry, they can retry.

Do you just want a better error message? What text would sound good to you?