traefik: 500 errors with no error in logs
Do you want to request a feature or report a bug?
Bug
What did you do?
We have been using traefik in production for some months, and have been quite happy with it so far. However, probably due to our increasing traffic, we have been experiencing more and more randomly distributed 500 errors, and given the absence of logs from our service side, we started to believe that the errors were coming from traefik. A solution for us may be to scale up our services or number of instances - maybe add up some traefik instances? - but before we do that we need to make sure we know where the problem comes from.
We believe this issue might be related to #3054.
The following (stress) test shows a way to reproduce the same behaviour.
We setup a single-instance traefik with some frontends pointing to one another, and launch 10 concurrent requests POSTing a ~1M file. In our tests, this high number of concurrent requests yields a higher probability to trigger the 500 errors.
$ docker-compose up -d
$ for i in $(seq 1 10) ; do (curl -X POST -d@mario.png -s -o /dev/null -w "%{http_code} " localhost:80/20/19/18/17/16/15/14/13/12/11/10/9/8/7/6/5/4/3/2/1 &) ; done
What did you expect to see?
200 200 200 200 200 200 200 200 200 200
Or at least an error message in the logs.
What did you see instead?
500 500 200 200 500 500 500 500 500 500
With no error message in the logs.
Output of traefik version
: (What version of Traefik are you using?)
Version: v1.6.0-rc4
Codename: tetedemoine
Go version: go1.10.1
Built: 2018-04-04_01:40:48PM
OS/Arch: linux/amd64
What is your environment & configuration (arguments, toml, provider, platform, …)?
traefik.toml
defaultEntryPoints = ["http"]
[entryPoints]
[entryPoints.http]
address = ":80"
traefik-services.toml
defaultEntryPoints = ["http"]
[entryPoints]
[entryPoints.http]
address = ":80"
[backends.backend1]
[backends.backend1.servers.server1]
url = "http://server"
[frontends.frontend1]
entryPoints = ["http"]
passHostHeader = true
backend = "backend1"
[frontends.frontend1.routes.route1]
rule = "PathPrefixStrip:/1"
{{$servers := dict}}
{{$_ := set $servers "2" "1"}}
{{$_ := set $servers "3" "2"}}
{{$_ := set $servers "4" "3"}}
{{$_ := set $servers "5" "4"}}
{{$_ := set $servers "6" "5"}}
{{$_ := set $servers "7" "6"}}
{{$_ := set $servers "8" "7"}}
{{$_ := set $servers "9" "8"}}
{{$_ := set $servers "10" "9"}}
{{$_ := set $servers "11" "10"}}
{{$_ := set $servers "12" "11"}}
{{$_ := set $servers "13" "12"}}
{{$_ := set $servers "14" "13"}}
{{$_ := set $servers "15" "14"}}
{{$_ := set $servers "16" "15"}}
{{$_ := set $servers "17" "16"}}
{{$_ := set $servers "18" "17"}}
{{$_ := set $servers "19" "18"}}
{{$_ := set $servers "20" "19"}}
{{range $i, $j := $servers}}
[backends.backend{{$i}}]
[backends.backend{{$i}}.servers.server1]
url = "http://localhost/{{$j}}"
[frontends.frontend{{$i}}]
entryPoints = ["http"]
passHostHeader = true
backend = "backend{{$i}}"
[frontends.frontend{{$i}}.routes.route1]
rule = "PathPrefixStrip:/{{$i}}"
{{end}}
docker-compose.yml
version: '3'
services:
traefik:
image: traefik:1.6.0-rc4
command:
--logLevel=WARN
--docker
--docker.filename=/traefik-services.toml
--accesslog
--accesslog.filepath=/dev/stderr
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./traefik.toml:/traefik.toml:ro
- ./traefik-services.toml:/traefik-services.toml:ro
ports:
- 127.0.0.1:80:80
server:
image: node:alpine
command: [node, -e, "require('http').createServer((req, res) => res.end('hello from server')).listen(80); process.on('SIGTERM', () => process.exit(128 + 15));"]
If applicable, please paste the log output at DEBUG level (--logLevel=DEBUG
switch)
See attached file logs.txt
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 17 (4 by maintainers)
We’ve upgraded our production Traefik deployments to 1.7.3. Hopefully no more 500s…
@crimoniv and @jsleeio as I read the thread this appears to be two resolutions to the issue, one is not a traefik issue but rather a backend fault and the other an upgraded traefick version. We are experiencing this same issue with intermittent 504s with requests leaving traefick hitting the overlay, sometimes getting to the app that responds with a 200, sometime not being received at all and in both cases never making it back to traefik which eventually throws the 504. Can you share your backend’s fault (resolution) or the result after the version upgrade?