serving: 500 errors with queue-proxy in knative versions higher than 0.24.0

What version of Knative?

1.0.0

Expected Behavior

No 500 errors.

Actual Behavior

Sporadic 500 errors being presented to the client. From the queue-proxy logs on the service, we see the following error message:

httputil: ReverseProxy read error during body copy: read tcp 127.0.0.1:51376->127.0.0.1:8080: use of closed network connection

Steps to Reproduce the Problem

We were able to reproduce these errors with knative 1.0.0, as well as knative 0.26.0 and 0.25.1. We’re not seeing an issue in 0.24.0. I wasn’t able to reproduce it using hey and helloworld-go so it appears workload-dependent to some degree, however we’re not doing anything unusual except that our responses are understandably larger than what helloworld-go sends out, being an actual production API workload.

The count of 500’s is fairly small (5 occurrences over several thousand requests), but it’s higher than the rate of zero errors we were getting before, and we can’t explain it. There doesn’t appear to be any anomalous behaviour in the app itself that seems to be causing the issues (memory/cpu are stable, etc.)

Once we moved to knative 0.24.0, the problem went away. (The knative serving operator makes it really easy to move between versions 💯 )

Some digging in Go’s repo resulted in an interesting lead, which might be related, and points to an issue in Go rather than queue-proxy specifically.

Was wondering if this is on the right track, or if anyone else experiences this in their logs. This is preventing us from being able to upgrade to 1.0.0 in production.

Thanks!

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 21 (12 by maintainers)

Commits related to this issue

Most upvoted comments

I am back on it. I have a small project here experimenting locally with the full duplex. Envoy seems having no issues (havent tested older clients etc), I will create a PR and we can discuss further. cc @dprotaso

Adding to v1.12.0 release since we’ll probably be able to switch to go1.21 for that release

@ReToCode my goal btw is to have some annotation to enable it per workload and turn it on/off on demand. Activator can detect that annotation in the revision and setup the right type of connection in its handler (we create a rv proxy per request afaik).

I will take a look. /assign @skonto

@skonto having the behaviour configurable via an annotation on the revision would be extremely handy. depending on the defaults chosen we can pick the affected services on our end and have them opt in/out

I prodded the upstream golang issue - https://github.com/golang/go/issues/40747#issuecomment-1112810498

we leverage the httputil.ReverseProxy - it would be nice if they could fix this upstream.

I’m on @adriangudas 's team and we’ve been testing this issue on knative version 1.4.0. The issue still persists.

httputil: ReverseProxy read error during body copy: read tcp 127.0.0.1:45456->127.0.0.1:8080: use of closed network connection
error reverse proxying request; sockstat: sockets: used 201
TCP: inuse 140 orphan 2 tw 19 alloc 632 mem 162
UDP: inuse 0 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

The application we’re testing against has not changed since last time, and was unchanged from when we were using knative 0.24, but the issue started happening when we tried to upgrade knative to newer versions.