traefik: Failing health check but still available

Do you want to request a feature or report a bug?

Bug

Did you try using a 1.7.x configuration for the version 2.0?

  • Yes
  • No

What did you do?

I have a small nginx that I ocassionally take “offline” by deliberately failing the health check, which works with traefik 1.7.12 but no longer in v2.0.

I have configured the health check as below and see the health check in the nginx log giving a 404, but even then, the server is still available. Am I missing something?

What did you expect to see?

Failing health check takes service offline (works in 1.7.12)

What did you see instead?

Service still available, serverStatus in /api/rawdata still UP

Output of traefik version: (What version of Traefik are you using?)

Version:      2.0.0-beta1
Codename:     faisselle
Go version:   go1.12.7
Built:        2019-07-19T16:04:34Z
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, …)?

Docker provider is used, labels:

    labels:
      - "traefik.docker.network=web"
      - "traefik.enable=true"
      - "traefik.http.routers.stuffhttps.rule=Host(`domain`) || Host(`domain2`)"
      - "traefik.http.routers.stuffhttps.entrypoints=https"
      - "traefik.http.routers.stuffhttps.tls.certresolver=default"
      - "traefik.http.routers.stuffhttps.middlewares=sec@file"

      - "traefik.http.routers.stuffhttp.middlewares=redirect"
      - "traefik.http.routers.stuffhttp.rule=Host(`domain`) || Host(`domain2`)"
      - "traefik.http.routers.stuffhttp.entrypoints=http"
           
      - "traefik.http.middlewares.redirect.headers.sslredirect=true"
      
      - "traefik.http.services.stuff.loadbalancer.healthcheck.path=/health/health"
      - "traefik.http.services.stuff.loadbalancer.healthcheck.interval=15s"
      - "traefik.http.services.stuff.loadbalancer.healthcheck.timeout=1s"

nginx log example:

172.29.0.3 - - [20/Aug/2019:06:08:14 +0000] "GET /health/health HTTP/1.1" 404 153 "-" "Go-http-client/1.1" "-"                                                                                          
2019/08/20 06:08:29 [error] 6#6: *89 open() "/usr/share/nginx/html/health/health" failed (2: No such file or directory), client: 172.29.0.3, server: localhost, request: "GET /health/health HTTP/1.1", 
host: "172.29.0.6:80"

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 23 (9 by maintainers)

Most upvoted comments

Any update on this issue ? I’m using traefik 2.0 rc2 and I’m in the same situation.

Just upgraded my LB cluster and confirmed this is resolved in release 2.0.1.

Reproducted on a simpler setup (see below). The case arise when there are 2 routers with a common service on a given container. If there are only 1 router and 1 service, then it works at expected.

version: '3'

services:
  proxy:
    image: traefik:v2.0.1
    command:
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --providers.docker
      - --api.insecure
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    labels:
      - "traefik.enable=false"

  app:
    image: nginx:alpine
    labels:
      - "traefik.http.routers.webapphttpsec.rule=Host(`localhost`)"
      - "traefik.http.routers.webapphttpsec.entrypoints=websecure"
      - "traefik.http.routers.webapphttpsec.tls=true"
      - "traefik.http.services.webapphttpsec-svc.loadbalancer.healthcheck.path=/health.on"

      - "traefik.http.routers.webapphttp.rule=Host(`localhost`)"
      - "traefik.http.routers.webapphttp.entrypoints=web"
$ docker-compose up -d
$ docker-compose logs app # Check that there is a 404 for the healthcheck from backend's logs
...

$ curl --insecure --location --verbose --silent https://localhost/index.html # Expected: "HTTP/2 503", but it is "HTTP/2 200".
...
< HTTP/2 200 
...

View from the dashboard:

Screenshot 2019-10-07 at 14 09 45

If the router webapphttp is commented out, then you have a HTTP/2 503 - Service Unavailable as expected:

$ #Comment out the 2 labels related to `traefik.http.routers.webapphttp.*` router
$ docker-compose up -d
$ curl --insecure --location --verbose --silent https://localhost/index.html # Expected: "HTTP/2 503", got "HTTP/2 200".
...
< HTTP/2 503 
...
Service Unavailable
$

No worries - it was an odd proof of concept, but the clearest I could come up with.

Thanks for the awesome project. I use it daily in my job, and my personal life.

I understood your goal after posting.

Thanks again.

Thanks Idez - I have followed those practices in my production file, I was paring it down for my example.

In the case of the service names, I was trying to simulate a failure of one of the two backend nodes for a single service - intentionally adding both containers to the same service.

In my production config, it looks like:

[...]
  [http.services]
[...]
    # Backend servers
    [http.services.svc1]
      [http.services.svc1.loadBalancer]
        [[http.services.svc1.loadBalancer.servers]]
          url = "http://backend1.example.com:8000"
        [[http.services.svc1.loadBalancer.servers]]
          url = "http://backend2.example.com:8000"
[...]

In this case, when one of the two backends (say, backend2.example.com) goes offline, the behavior mentioned above occurs, where half the requests fail.

Hope that helps explain the purpose of my POC.

I am also in the same situation - the logs are noting:

time="2019-09-13T14:19:08Z" level=warning msg="Health check still failing. Backend: \"backend-service@file\" URL: \"http://backend.server.example.com:8080\" Reason: HTTP request failed: Get http://backend.server.example.com:8080/some/file/script.js: dial tcp 10.20.10.4:8080: connect: connection refused"

But the dashboard still shows as healthy (status, green checkmark in services), and requests are still sometimes getting sent to it, resulting in 502 to end users.

I have also confirmed in the access logs that the 502 client requests are indeed being routed to the backend.