caddy: Random 502 errors with Caddy 0.9.1 in reverse proxy mode, no load

1. What version of Caddy are you running (caddy -version)?

Caddy 0.9.1 (+e8e5595 Mon Aug 29 16:15:56 UTC 2016)

2. What are you trying to do?

Use Caddy as a reverse proxy in front of a Go web server

3. What is your entire Caddyfile?

utiliz.co {
   redir https://www.utiliz.co
}

www.utiliz.co {
  tls <email>
  gzip
  proxy / localhost:<XXXX> {
    header_upstream Host {host}
    header_upstream X-Real-IP {remote}
    header_upstream X-Forwarded-For {remote}
    header_upstream X-Forwarded-Proto {scheme}
  }
}

admin.utiliz.co {
  tls <email>
  gzip
  proxy / localhost:<YYYY> {
    header_upstream Host {host}
    header_upstream X-Real-IP {remote}
    header_upstream X-Forwarded-For {remote}
    header_upstream X-Forwarded-Proto {scheme}
  }
}

4. How did you run Caddy (give the full command and describe the execution environment)?

It’s running under supervisord:

[program:caddy]
command=/path/caddy/caddy
directory=/path/caddy

5. What did you expect to see?

No errors

6. What did you see instead (give full error messages and/or log)?

Today I upgraded caddy from 0.8.3 to 0.9.1. Subsequently during the day I got complaints from users saying they received 502 errors. The errors happened on both the www.utiliz.co and admin.utiliz.co sites. There was never any significant load, maybe 1-2 concurrent users. Retrying immediately always worked fine. There is no problem at all with the back end server (to confirm it was not a load issue I slammed the site with 500 concurrent users via boom and did not get any 502s at all.) I never saw any 502 errors when using 0.8.3.

This is the pattern of 502s during the day today.

30/Aug/2016:15:25:11 +0000 [ERROR 502 /ic/checkout/ordersummary] unreachable backend 30/Aug/2016:15:25:11 +0000 [ERROR 502 /checkout/save] unreachable backend 30/Aug/2016:15:25:11 +0000 [ERROR 502 /favicon.ico] unreachable backend 30/Aug/2016:15:25:17 +0000 [ERROR 502 /features] unreachable backend 30/Aug/2016:16:10:58 +0000 [ERROR 502 /login/action] unreachable backend 30/Aug/2016:16:10:58 +0000 [ERROR 502 /login/action] unreachable backend 30/Aug/2016:16:10:59 +0000 [ERROR 502 /favicon.ico] unreachable backend 30/Aug/2016:16:11:03 +0000 [ERROR 502 /] unreachable backend 30/Aug/2016:22:37:07 +0000 [ERROR 502 /users] unreachable backend 30/Aug/2016:22:37:07 +0000 [ERROR 502 /] unreachable backend 30/Aug/2016:22:37:08 +0000 [ERROR 502 /favicon.ico] unreachable backend 31/Aug/2016:00:27:12 +0000 [ERROR 502 /checkout] unreachable backend 31/Aug/2016:00:27:18 +0000 [ERROR 502 /checkout] unreachable backend 31/Aug/2016:00:27:19 +0000 [ERROR 502 /checkout] unreachable backend

7. How can someone who is starting from scratch reproduce this behavior as minimally as possible?

I don’t know, it seems random.

For now I am downgrading back to Caddy 0.8.3, which is disappointing.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 2
  • Comments: 35 (6 by maintainers)

Commits related to this issue

Most upvoted comments

I just merged #1135 which drastically changes the logic regarding some of these parameters, and improves error reporting. I’m going to close this issue now in light of that, and would ask that all who have experienced this issue to pull the latest Caddy and try again – note the changes in that PR may require a few minor changes to your Caddyfile for proper failover handling. (See the PR description for details.)

Also, unless a correlation can be exhibited between the 502 errors and the concurrent map accesses, I am treating them separately and will confine this issue to the 502 errors; we’ll handle the issue with concurrent map accesses in another issue.

The changes will also go out with Caddy 0.9.3, so if you’re not able to build from source, just wait for that release (coming any day now). Thank you!

I figured a workaround, by disabling http2 -http2=false

This is happening for me as well, when I click on link on the page and right away click another link. Chrome cancels the first request and continue with a second one. Nginx(proxy) sends a 499 response, then caddy marks proxy upstream as down and I have to wait 10 seconds to get proper response again. I tried to set max_fails to 0 but requests then took about 3 seconds to respond. I went with fail_timeout 2s for now.

I have Caddy 0.9.1 configured as proxy to an nginx server.

# nginx log
web_1        | 172.17.0.1 - - [06/Sep/2016:18:17:49 +0000] "GET /?page=6 HTTP/1.1" 499 0 "https://site.com/?page=7" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"

It also never happens with cURL, whereas it happens 90% of the time on the same request with Firefox. Combined with the fact that it only happens over TLS, I’d suspect session resumption or HTTP/2.

0.9.2 did not fix this issue, but -http2=false still helps

@mholt at your suggestion I disabled http/2 in my prod environment, will monitor for several days and report back. Thanks for the idea.

I guess the next step will be for me to add debug printing to ReverseProxy.ServeHTTP

Okay I’m running the custom build that prints backendErr in my prod environment now, will let you know what I find out.