imgproxy: imgproxy stops downloading source images randomly

We are encountering a weird issue with our production imgproxy deployment.

It is deployed on Google Cloud Run (managed container platform) and has been running fine for quite some time. Thanks a lot for this btw 👍

Our setup is processing about 200k requests per day without any issues, but lately some containers began stop processing images and get stuck until we kill them.

We see logs like this when it happens:

time="2022-10-19T09:48:50Z" level="error" message="Completed in 24.949903028s /insecure/resize:fill:1500:0/plain/gs:%2F%2Fxxx/product/535a49ee_default.jpeg" request_id="DJJBxB3d3ld7xUV-B1hbp" method="GET" status="500" client_ip="xxx" error="Can't download source image: The image request timed out"
time="2022-10-19T09:48:50Z" level="error" message="Completed in 10.000268855s /insecure/resize:fill-down:120:120/plain/gs:%2F%2Fxxx/prod/a2e02af8455e435b8dcca2d67/acf8e1b8183e4f5e9498d1905_default.jpeg" request_id="Nj3ey4iGeLI_yAq1vOQuk" method="GET" status="503" client_ip="xxx" error="Request was timed out after 10.000235681s"

Nothing changes on the configuration side when this issue happen, and the other imgproxy containers seems to be working fine. The only way to solve the issue is to deploy a new Cloud Run revision, which stops the existing containers (including the stuck one) and launches new ones.

We do not have health checks setup yet as they are a new Cloud Run feature and not supported by Terraform, but we plan to add them manually at some point to see if it detects the stuck containers (but as they still report timeouts, I dont think it will work).

We tried various Cloud Run settings (reducing concurrency…) without any effect, it keeps reproducing about once a day.

Do you have any ideas on what could be causing this, and how we could troubleshoot this further?

We are currently running imgproxy 3.7.2 and plan to upgrade tomorrow to see how it goes, but this issue did not appear after our previous upgrade, but some time after.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 42 (31 by maintainers)

Most upvoted comments

I guess I finally found what may cause this issue. HTTP/2 connections may become stuck, and Go’s HTTP client doesn’t handle this properly by default and tries to reuse stuck connections. Since GCS uses HTTP/2, this bug affects it too. More info: https://github.com/golang/go/issues/30702, https://github.com/googleapis/google-cloud-go/issues/3522

I’ll try to reproduce the bug with stuck connections and add a walkaround. The temporary solution is to disable HTTP/2 globally: GODEBUG=http2client=0

I’ll try to make a release tomorrow

Don’t make promises you’re not ready to keep 😅

Anyway, the v3.16.0 release is out, and it includes the fix for this issue. I configured the HTTP clients so they check HTTP/2 connections’ liveness. This allows imgproxy to successfully recover after an HTTP/2 connection becomes stuck. Also, I made imgroxy to repeat requests for source images after getting a “connection lost” error.

The v3.15.0 release includes my tweaks. Hope this will solve the problem

I made some tweaks to the source image downloading process in the latest build. Also, if you use many different image sources, I’d recommend setting IMGPROXY_CLIENT_KEEP_ALIVE_TIMEOUT to 0.

Having a similar issue over here. Imgproxy stops working after several timeouts

ERROR   [2023-03-18T08:24:56Z] Completed in 5.076879661s /Kc1dSoUDos202QHn-kUH2zSfVWBicc3GGfAoxLQot18/rs:fit:0:1200/g:sm/aHR0cHM6Ly9hcGkua3J1dS5jb20vdX.... request_id=zrP8vNByraNgl_kX792Lo method=GET status=500 client_ip=172.19.0.1 error="Can't download source image: The image request timed out"
/app/imagedata/download.go:245 github.com/imgproxy/imgproxy/v3/imagedata.download
/app/imagedata/image_data.go:134 github.com/imgproxy/imgproxy/v3/imagedata.Download
/app/processing_handler.go:306 main.handleProcessing.func2
/app/processing_handler.go:293 main.handleProcessing
/app/server.go:113 main.withCORS.func1
/app/server.go:164 main.withPanicHandler.func1
/app/router/router.go:102 github.com/imgproxy/imgproxy/v3/router.(*Router).ServeHTTP
/usr/local/go/src/net/http/server.go:2937 net/http.serverHandler.ServeHTTP
/usr/local/go/src/net/http/server.go:1996 net/http.(*conn).serve
/usr/local/go/src/runtime/asm_amd64.s:1599 runtime.goexit

What’s strange on this is that imgproxy stops all other requests as well, needs manual intervention (restart) as the health-check in this case is still returing 200 OK - despite the fact, that something seems to be broken.

What can we do in that case? Increase Concurrency?

Version is the latest 3.14

@renchap @Klaitos imgproxy’s GCS transport hasn’t been changed in ages, so downgrading imgproxy won’t help. imgproxy uses the GCS client provided by the official SDK, so there are not many things we can tune 😦

Switching to the 2nd generation environment may help as they promise better networking. Not sure if the networking is the cause but I’d try.

Switching to plain https://storage.googleapis.com/… URLs may also help as requests to them are served by the HTTP client configured by imgproxy.

@MikeVL Your issue seems to be completely different. I’d recommend configuring a metrics collection to see what’s going wrong.