caddy: Memory/cpu leak in proxy directive when max_fails 0 and backend fails
1. What version of Caddy are you running (caddy -version)?
v0.9.1
2. What are you trying to do?
Serving traffic via the proxy directive using max_fails 0 as only one backend is available.
3. What is your entire Caddyfile?
:2015 {
proxy / 127.0.0.1:8080 {
transparent
max_fails 0
}
}
4. How did you run Caddy (give the full command and describe the execution environment)?
./caddy --agree --conf=caddyfile_test
5. What did you expect to see?
No memory leak and no 100% CPU usage on 4 cores.
6. What did you see instead (give full error messages and/or log)?
4 cores at 100%, 4GB memory filled in under 1 minute after the first backend failure arose.
Also stdout seems to go crazy with:
goroutine 2826341 [sleep, 2 minutes]:
time.Sleep(0x2540be400)
/usr/local/go/src/runtime/time.go:59 +0xe1
github.com/mholt/caddy/caddyhttp/proxy.Proxy.ServeHTTP.func1(0xc4202c5420, 0x2540be400)
/tmp/custombuild_111_575206995/src/github.com/mholt/caddy/caddyhttp/proxy/proxy.go:157 +0x2b
created by github.com/mholt/caddy/caddyhttp/proxy.Proxy.ServeHTTP
/tmp/custombuild_111_575206995/src/github.com/mholt/caddy/caddyhttp/proxy/proxy.go:159 +0x44e
goroutine 2831798 [sleep, 2 minutes]:
time.Sleep(0x2540be400)
/usr/local/go/src/runtime/time.go:59 +0xe1
github.com/mholt/caddy/caddyhttp/proxy.Proxy.ServeHTTP.func1(0xc4202c5420, 0x2540be400)
/tmp/custombuild_111_575206995/src/github.com/mholt/caddy/caddyhttp/proxy/proxy.go:157 +0x2b
created by github.com/mholt/caddy/caddyhttp/proxy.Proxy.ServeHTTP
/tmp/custombuild_111_575206995/src/github.com/mholt/caddy/caddyhttp/proxy/proxy.go:159 +0x44e
7. How can someone who is starting from scratch reproduce this behavior as minimally as possible?
- use proxy with max_fails 0
- have a high traffic server
- force a backend unreachable error
- see how it goes crazy
As always edge cases \o/
The fix for now was to use max_fails to 50, which doesn’t get reached under usual circumstances.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 16 (6 by maintainers)
I feel like people’s backend services are more flaky than one thinks… just because you can connect once with telnet doesn’t mean that amid the thousands of requests before that, one request didn’t fail.
I do agree we need better error reporting in this case.
@nemothekid After re-reading your “thinking out loud” post above, what if we make a few changes:
^ These are just the way I’m leaning so far. What do you (anyone!) think?
Something conceptually like:
Yeah the stdout errors are caused by it launching as many goroutines as possible in that infinite loop (which explains why its eating memory).
max_fails 0likely shouldn’t be a valid directive,max_fails 1should be the minimum.Or we can change the meaning of
max_fails 0. If the backend fails andmax_failsis 0, we immediately returnhttp.StatusBadGateway, but we don’t mark the backend down (so the next request can still try that backend).