imgproxy: imgproxy stops downloading source images randomly
We are encountering a weird issue with our production imgproxy deployment.
It is deployed on Google Cloud Run (managed container platform) and has been running fine for quite some time. Thanks a lot for this btw đ
Our setup is processing about 200k requests per day without any issues, but lately some containers began stop processing images and get stuck until we kill them.
We see logs like this when it happens:
time="2022-10-19T09:48:50Z" level="error" message="Completed in 24.949903028s /insecure/resize:fill:1500:0/plain/gs:%2F%2Fxxx/product/535a49ee_default.jpeg" request_id="DJJBxB3d3ld7xUV-B1hbp" method="GET" status="500" client_ip="xxx" error="Can't download source image: The image request timed out"
time="2022-10-19T09:48:50Z" level="error" message="Completed in 10.000268855s /insecure/resize:fill-down:120:120/plain/gs:%2F%2Fxxx/prod/a2e02af8455e435b8dcca2d67/acf8e1b8183e4f5e9498d1905_default.jpeg" request_id="Nj3ey4iGeLI_yAq1vOQuk" method="GET" status="503" client_ip="xxx" error="Request was timed out after 10.000235681s"
Nothing changes on the configuration side when this issue happen, and the other imgproxy containers seems to be working fine.
The only way to solve the issue is to deploy a new Cloud Run revision, which stops the existing containers (including the stuck one) and launches new ones.
We do not have health checks setup yet as they are a new Cloud Run feature and not supported by Terraform, but we plan to add them manually at some point to see if it detects the stuck containers (but as they still report timeouts, I dont think it will work).
We tried various Cloud Run settings (reducing concurrencyâŚ) without any effect, it keeps reproducing about once a day.
Do you have any ideas on what could be causing this, and how we could troubleshoot this further?
We are currently running imgproxy 3.7.2 and plan to upgrade tomorrow to see how it goes, but this issue did not appear after our previous upgrade, but some time after.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 42 (31 by maintainers)
I guess I finally found what may cause this issue. HTTP/2 connections may become stuck, and Goâs HTTP client doesnât handle this properly by default and tries to reuse stuck connections. Since GCS uses HTTP/2, this bug affects it too. More info: https://github.com/golang/go/issues/30702, https://github.com/googleapis/google-cloud-go/issues/3522
Iâll try to reproduce the bug with stuck connections and add a walkaround. The temporary solution is to disable HTTP/2 globally:
GODEBUG=http2client=0Donât make promises youâre not ready to keep đ
Anyway, the
v3.16.0release is out, and it includes the fix for this issue. I configured the HTTP clients so they check HTTP/2 connectionsâ liveness. This allows imgproxy to successfully recover after an HTTP/2 connection becomes stuck. Also, I made imgroxy to repeat requests for source images after getting a âconnection lostâ error.The
v3.15.0release includes my tweaks. Hope this will solve the problemI made some tweaks to the source image downloading process in the
latestbuild. Also, if you use many different image sources, Iâd recommend settingIMGPROXY_CLIENT_KEEP_ALIVE_TIMEOUTto 0.Having a similar issue over here. Imgproxy stops working after several timeouts
Whatâs strange on this is that imgproxy stops all other requests as well, needs manual intervention (restart) as the health-check in this case is still returing 200 OK - despite the fact, that something seems to be broken.
What can we do in that case? Increase Concurrency?
Version is the latest 3.14
@renchap @Klaitos imgproxyâs GCS transport hasnât been changed in ages, so downgrading imgproxy wonât help. imgproxy uses the GCS client provided by the official SDK, so there are not many things we can tune đŚ
Switching to the 2nd generation environment may help as they promise better networking. Not sure if the networking is the cause but Iâd try.
Switching to plain
https://storage.googleapis.com/âŚURLs may also help as requests to them are served by the HTTP client configured by imgproxy.@MikeVL Your issue seems to be completely different. Iâd recommend configuring a metrics collection to see whatâs going wrong.