traefik: Failing health check but still available
Do you want to request a feature or report a bug?
Bug
Did you try using a 1.7.x configuration for the version 2.0?
- Yes
- No
What did you do?
I have a small nginx that I ocassionally take “offline” by deliberately failing the health check, which works with traefik 1.7.12 but no longer in v2.0.
I have configured the health check as below and see the health check in the nginx log giving a 404
, but even then, the server is still available. Am I missing something?
What did you expect to see?
Failing health check takes service offline (works in 1.7.12)
What did you see instead?
Service still available, serverStatus
in /api/rawdata
still UP
Output of traefik version
: (What version of Traefik are you using?)
Version: 2.0.0-beta1
Codename: faisselle
Go version: go1.12.7
Built: 2019-07-19T16:04:34Z
OS/Arch: linux/amd64
What is your environment & configuration (arguments, toml, provider, platform, …)?
Docker provider is used, labels:
labels:
- "traefik.docker.network=web"
- "traefik.enable=true"
- "traefik.http.routers.stuffhttps.rule=Host(`domain`) || Host(`domain2`)"
- "traefik.http.routers.stuffhttps.entrypoints=https"
- "traefik.http.routers.stuffhttps.tls.certresolver=default"
- "traefik.http.routers.stuffhttps.middlewares=sec@file"
- "traefik.http.routers.stuffhttp.middlewares=redirect"
- "traefik.http.routers.stuffhttp.rule=Host(`domain`) || Host(`domain2`)"
- "traefik.http.routers.stuffhttp.entrypoints=http"
- "traefik.http.middlewares.redirect.headers.sslredirect=true"
- "traefik.http.services.stuff.loadbalancer.healthcheck.path=/health/health"
- "traefik.http.services.stuff.loadbalancer.healthcheck.interval=15s"
- "traefik.http.services.stuff.loadbalancer.healthcheck.timeout=1s"
nginx log example:
172.29.0.3 - - [20/Aug/2019:06:08:14 +0000] "GET /health/health HTTP/1.1" 404 153 "-" "Go-http-client/1.1" "-"
2019/08/20 06:08:29 [error] 6#6: *89 open() "/usr/share/nginx/html/health/health" failed (2: No such file or directory), client: 172.29.0.3, server: localhost, request: "GET /health/health HTTP/1.1",
host: "172.29.0.6:80"
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 23 (9 by maintainers)
Any update on this issue ? I’m using traefik 2.0 rc2 and I’m in the same situation.
Just upgraded my LB cluster and confirmed this is resolved in release 2.0.1.
Reproducted on a simpler setup (see below). The case arise when there are 2 routers with a common service on a given container. If there are only 1 router and 1 service, then it works at expected.
View from the dashboard:
If the router
webapphttp
is commented out, then you have aHTTP/2 503 - Service Unavailable
as expected:No worries - it was an odd proof of concept, but the clearest I could come up with.
Thanks for the awesome project. I use it daily in my job, and my personal life.
I understood your goal after posting.
Thanks again.
Thanks Idez - I have followed those practices in my production file, I was paring it down for my example.
In the case of the service names, I was trying to simulate a failure of one of the two backend nodes for a single service - intentionally adding both containers to the same service.
In my production config, it looks like:
In this case, when one of the two backends (say,
backend2.example.com
) goes offline, the behavior mentioned above occurs, where half the requests fail.Hope that helps explain the purpose of my POC.
I am also in the same situation - the logs are noting:
But the dashboard still shows as healthy (status, green checkmark in services), and requests are still sometimes getting sent to it, resulting in 502 to end users.
I have also confirmed in the access logs that the 502 client requests are indeed being routed to the backend.