traefik: Traefik web entrypoint dies randomly
Do you want to request a feature or report a bug?
Bug
What did you do?
I’ve been using Traefik 1.x for almost a year without any downtimes. But after switching to 2.x I started to see downtimes. In the last month alone it happened about 4 times. There is no connection I could find why would this be happening and the error is also not descriptive to understand what to look.
{"entryPointName":"web","level":"error","msg":"set tcp 172.16.0.17:8088-\u003e114.55.164.60:46112: setsockopt: connection reset by peer","time":"2021-04-14T03:39:21Z"}
{"entryPointName":"web","level":"error","msg":"Error while starting server: set tcp 172.16.0.17:8088-\u003e114.55.164.60:46112: setsockopt: connection reset by peer","time":"2021-04-14T03:39:21Z"}
{"entryPointName":"web","level":"error","msg":"Error while starting server: set tcp 172.16.0.17:8088-\u003e114.55.164.60:46112: setsockopt: connection reset by peer","time":"2021-04-14T03:39:21Z"}
Dashboard and websecure entrypoint are working fine at that time and I can clearly see there is nothing listening on port 8088 where web entrypoint is waiting for the request. In access log there is nothing suspicious but there is no connection from this IP either. Every time this happens external IP is different so I doubt any kind of attack is in place.
I’ve tested address: 0.0.0.0:8088
with different options like without IP, on fixed IP etc. always the same issue.
What did you expect to see?
http requests go through without disruption
What did you see instead?
http is randomly crashing
Output of traefik version
: (What version of Traefik are you using?)
I’m using version traefik-2.4.8 (traefik-2.4.7 same issue) OS: FreeBSD 12.1
Version: 2.4.8
Codename: portbuild
Go version: go1.16.2
Built: 2021-04-01_11:19:34AM
OS/Arch: freebsd/amd64
What is your environment & configuration (arguments, toml, provider, platform, …)?
pilot:
token: "xxxxx"
global:
checkNewVersion: false
sendAnonymousUsage: false
entryPoints:
web:
address: 0.0.0.0:8088
websecure:
address: 0.0.0.0:8443
log:
level: WARN
filePath: /var/log/traefik.log
format: json
accessLog:
filePath: /var/log/traefik.access.log
format: json
api:
insecure: true
ping:
entryPoint: "web"
certificatesResolvers:
myresolver:
acme:
email: foo@bar.com
storage: /usr/local/etc/acme.json
httpChallenge:
# used during the challenge
entryPoint: web
providers:
consulCatalog:
prefix: traefik2
exposedByDefault: false
refreshInterval: 10s
cache: false
endpoint:
address: 172.16.0.15:8500
consul:
endpoints:
- "172.16.0.15:8500"
I can enable DEBUG but there is too much noise and this happens randomly so not sure if it could help here.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 21 (4 by maintainers)
I believe this is still happening. I have a workaround to check if ports are up and if not restart the service. But feels kinda lame, so anything we can add to get more debug info what is going on would be more than welcome.
@robske110 Well, this error most probably comes from Golang standard library. My best guess for the code of Traefik involved in the production of the error: https://github.com/traefik/traefik/blob/6ae194934d50db135d027f488da947f007a4a7e0/pkg/server/server_entrypoint_tcp.go#L344-L354
Thus, without a reproduction case, it’s not easy to go further to fix this issue.
You can comment those lines and make your own build to confirm that the problem comes from that part of the code. But even if you succeed to get rid of that error, we would not accept or make a PR, unless we could reproduce the issue.
Also, we are not willing to make Traefik stop whenever an entryPoint dies, it would be a breaking behavior, and it’s not obviously expected, as you may want that traffic going on on other entryPoints to continue to be handled.
What you can do, besides detecting that Traefik died, is to health check Traefik on all its entryPoints, then restart it when you diagnose an error.
Unfortunately, as already said, without a reproduction case, it will be difficult to address this issue.