traefik: Traefik web entrypoint dies randomly

Do you want to request a feature or report a bug?

Bug

What did you do?

I’ve been using Traefik 1.x for almost a year without any downtimes. But after switching to 2.x I started to see downtimes. In the last month alone it happened about 4 times. There is no connection I could find why would this be happening and the error is also not descriptive to understand what to look.

{"entryPointName":"web","level":"error","msg":"set tcp 172.16.0.17:8088-\u003e114.55.164.60:46112: setsockopt: connection reset by peer","time":"2021-04-14T03:39:21Z"}
{"entryPointName":"web","level":"error","msg":"Error while starting server: set tcp 172.16.0.17:8088-\u003e114.55.164.60:46112: setsockopt: connection reset by peer","time":"2021-04-14T03:39:21Z"}
{"entryPointName":"web","level":"error","msg":"Error while starting server: set tcp 172.16.0.17:8088-\u003e114.55.164.60:46112: setsockopt: connection reset by peer","time":"2021-04-14T03:39:21Z"}

Dashboard and websecure entrypoint are working fine at that time and I can clearly see there is nothing listening on port 8088 where web entrypoint is waiting for the request. In access log there is nothing suspicious but there is no connection from this IP either. Every time this happens external IP is different so I doubt any kind of attack is in place.

I’ve tested address: 0.0.0.0:8088 with different options like without IP, on fixed IP etc. always the same issue.

What did you expect to see?

http requests go through without disruption

What did you see instead?

http is randomly crashing

Output of traefik version: (What version of Traefik are you using?)

I’m using version traefik-2.4.8 (traefik-2.4.7 same issue) OS: FreeBSD 12.1

Version:      2.4.8
Codename:     portbuild
Go version:   go1.16.2
Built:        2021-04-01_11:19:34AM
OS/Arch:      freebsd/amd64

What is your environment & configuration (arguments, toml, provider, platform, …)?

pilot:
  token: "xxxxx"
global:
  checkNewVersion: false
  sendAnonymousUsage: false
entryPoints:
  web:
    address: 0.0.0.0:8088
  websecure:
    address: 0.0.0.0:8443
log:
  level: WARN
  filePath: /var/log/traefik.log
  format: json
accessLog:
  filePath: /var/log/traefik.access.log
  format: json
api:
  insecure: true
ping:
  entryPoint: "web"
certificatesResolvers:
  myresolver:
    acme:
      email: foo@bar.com
      storage: /usr/local/etc/acme.json
      httpChallenge:
        # used during the challenge
        entryPoint: web

providers:
  consulCatalog:
    prefix: traefik2
    exposedByDefault: false
    refreshInterval: 10s
    cache: false
    endpoint:
      address: 172.16.0.15:8500
  consul:
    endpoints:
      - "172.16.0.15:8500"

I can enable DEBUG but there is too much noise and this happens randomly so not sure if it could help here.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (4 by maintainers)

Most upvoted comments

I believe this is still happening. I have a workaround to check if ports are up and if not restart the service. But feels kinda lame, so anything we can add to get more debug info what is going on would be more than welcome.

@robske110 Well, this error most probably comes from Golang standard library. My best guess for the code of Traefik involved in the production of the error: https://github.com/traefik/traefik/blob/6ae194934d50db135d027f488da947f007a4a7e0/pkg/server/server_entrypoint_tcp.go#L344-L354

Thus, without a reproduction case, it’s not easy to go further to fix this issue.

You can comment those lines and make your own build to confirm that the problem comes from that part of the code. But even if you succeed to get rid of that error, we would not accept or make a PR, unless we could reproduce the issue.

Also, we are not willing to make Traefik stop whenever an entryPoint dies, it would be a breaking behavior, and it’s not obviously expected, as you may want that traffic going on on other entryPoints to continue to be handled.

What you can do, besides detecting that Traefik died, is to health check Traefik on all its entryPoints, then restart it when you diagnose an error.

Unfortunately, as already said, without a reproduction case, it will be difficult to address this issue.