traefik: Intermittent 404 on v2.2.2

Do you want to request a feature or report a bug?

Bug

What did you do?

update from v2.2.1 to v2.2.2

What did you expect to see?

middleware redirectregex working

What did you see instead?

page not found

Output of traefik version: (What version of Traefik are you using?)

Version:      2.2.2
Codename:     chevrotin
Go version:   go1.14.4
Built:        2020-07-08T15:30:29Z
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, …)?

Docker

  traefik:
    image: traefik:latest
    container_name: traefik
    restart: always
    command:
      # - "--log.level=DEBUG"
      - "--api=true"
      - "--api.dashboard=true"
      - "--accesslog=true"
      - "--metrics=true"
      - "--metrics.prometheus=true"
      - "--metrics.prometheus.manualrouting=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.leresolver.acme.tlschallenge=true"
      - "--certificatesresolvers.leresolver.acme.email=none@uel.br"
      - "--certificatesresolvers.leresolver.acme.storage=/letsencrypt/acme.json"
    ports:
      - "80:80"
      - "443:443"
      # - "8080:8080"
    volumes:
      - "./letsencrypt:/letsencrypt"
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
    labels:
      # global redirect to https
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
      - "traefik.http.routers.http-catchall.rule=hostregexp(`{host:.+}`)"
      - "traefik.http.routers.http-catchall.entrypoints=web"
      - "traefik.http.routers.http-catchall.middlewares=redirect-to-https"
      # redict-from-old-url
      - "traefik.http.middlewares.redict-from-old-url.redirectregex.permanent=true"
      - "traefik.http.middlewares.redict-from-old-url.redirectregex.regex=^https://oldname.uel.br/(.*)"
      - "traefik.http.middlewares.redict-from-old-url.redirectregex.replacement=https://newname.uel.br/gitlab/$${1}"
      - "traefik.http.routers.catch-old-url.rule=Host(`oldname.uel.br`)"
      - "traefik.http.routers.catch-old-url.entrypoints=websecure"
      - "traefik.http.routers.catch-old-url.tls.certresolver=leresolver"
      - "traefik.http.routers.catch-old-url.middlewares=redict-from-old-url"

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 39 (7 by maintainers)

Commits related to this issue

Most upvoted comments

A fix is coming: #7037

+1 observe quite the same but without redirectregex middleware 2.2.1 --> 2.2.2 access via entrypoint https/tls: all requests get “404 page not found” middleware redirectscheme seems to work. back to 2.2.1 without config changes and everything works as expected.

Hello all, thanks a lot for all your feedback throughout the day. We were finally able to reproduce the bug and are working on a hotfix right now.

Our hotfix will change the default behavior (insecureSNI) from false to true, essentially implementing the same behavior which existed in 2.2.1, which should mitigate the issue you are facing.

Let me quickly explain how / when this issue arises:

When utilizing HTTP/2 and a wildcard TLS certificate, after the first TLS Handshake, no further SNI will be sent to the proxy if the “new” origin exists with a SAN in the certificate. In our changed default behavior, we are now matching the SNI and a Host Header to be the same value. As in that case, there is no further SNI, the matching fails and this results in a 404. After the hotfix is released today, we will continue to work next week on an effective solution to address this matter.

If you’re interested, more information about that particular issue can is available in the RFC. The team apologies for all the inconvenience this may have caused so far.

WOW! A migration guide from x.x.1 to x.x.2 with breaking changes? Never expected something like this and that’s why I didn’t search for it. “We introduced the ability … and enabled it by default” is quite surprising.

I can confirm, after a quick change and setting insecureSNI: true https access works with 2.2.2. Changing all router rules from Host() to HostHeaders() takes more time but I guess it will work too.

Thanks for your quick response!

In the next release: v2.2.5

But you have to wait a bit, for now we recommend to use v2.2.1

I’m seeing a weird behaviour with 2.2.4.

I have a Docker Swarm with 2 wildcard certificates:

  • *.subdomain1.domain.tld
  • *.subdomain2.domain.tld

And 4 apps:

  • app1.subdomain1.domain.tld
  • app2.subdomain1.domain.tld
  • app3.subdomain2.domain.tld
  • app4.subdomain2.domain.tld

With version 2.2.4:

  • Default conf: ❌ intermittent 404
  • With insecureSNI = false: ❌ intermittent 404
  • With insecureSNI = true explicitly set: ✔️ seems OK

I hope I did my tests right. It’s weird because I thought insecureSNI = true should be the default with 2.2.4?

Just a note - this bit me in the arse too - however the earlier suggested fix of using “HostHeader” instead of “Host” rule has no effect. The only change that currently mitigates this issues is “insecureSNI: true”.

We hit this issues when we ran rolling updates/reboots to a production cluster yesterday. It was previously running 2.2.1, with the version pinned as image: traefik:chevrotin (which we thought was a reasonably conservative pin). We’re now obviously pinned it at traefik:2.2.1.

I don’t understand why you haven’t pulled the 2.2.2 - 2.2.4 docker images, or at least moved ‘latest’, given this is known to be broken. As it is, I lost my entire Sunday tracking down intermittent routing problems preventing users from logging in. Fortunately it was a Sunday and we were able to roll back on our blue/green system. Had it been on a week day, we could easily have suffered more severe reputational damage.

Hi,

I also noticed something similar with my development setup. With v2.2.2 routing is almost always broken, returning 404 when using Host rule. With HostHeader rule or Host rule and insecureSNI true, routing works all the time.

When using Host header, routing sometimes works for few minutes until it doesn’t and Traefik starts returning 404 for all requests. When Traefik is restarted, routing either doesn’t work from the beginning or works for few minutes, though there have been few times when routing worked for tens of minutes.

Chrome works differently than Firefox. With Chrome, all routing fails but with Firefox, just some fails. And sometimes this is reversed. Sometimes it’s just the CORS preflight (OPTIONS) requests that are returned with 404.

I’m using TLS and certs are made using mkcert so they are “valid”, at least browsers and other tools work as expected. Certificate host matches the request’s host.

When request originated from browser.

# Traefik logs

time="2020-07-10T04:37:15Z" level=debug msg="Authentication succeeded" middlewareName=auth@file middlewareType=BasicAuth
172.102.0.1 - user [10/Jul/2020:04:37:15 +0000] "GET /api/overview HTTP/1.1" 200 445 "-" "-" 173 "api-dashboard@file" "-" 72ms
172.102.0.1 - - [10/Jul/2020:04:37:15 +0000] "GET /api/authz HTTP/1.1" 404 19 "-" "-" 174 "-" "-" 0ms
time="2020-07-10T04:37:20Z" level=debug msg="Authentication succeeded" middlewareType=BasicAuth middlewareName=auth@file
172.102.0.1 - user [10/Jul/2020:04:37:20 +0000] "GET /api/overview HTTP/1.1" 200 445 "-" "-" 175 "api-dashboard@file" "-" 73ms

When using curl

# curl -vv https://api.localhost.local/api/authz

=> Traefik logs

time="2020-07-10T04:37:24Z" level=debug msg="URL.Path is now /api/authz (was /authz)." middlewareName=add-api@file middlewareType=AddPrefix
time="2020-07-10T04:37:24Z" level=debug msg="vulcand/oxy/roundrobin/rr: begin ServeHttp on request" Request="{\"Method\":\"GET\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/api/authz\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"\",\"Fragment\":\"\"},\"Proto\":\"HTTP/2.0\",\"ProtoMajor\":2,\"ProtoMinor\":0,\"Header\":{\"Accept\":[\"*/*\"],\"Accept-Encoding\":[\"gzip\"],\"User-Agent\":[\"curl/7.58.0\"],\"X-Forwarded-Host\":[\"api.localhost.local\"],\"X-Forwarded-Port\":[\"443\"],\"X-Forwarded-Prefix\":[\"/auth\"],\"X-Forwarded-Proto\":[\"https\"],\"X-Forwarded-Server\":[\"ba6a3f1a59f7\"],\"X-Real-Ip\":[\"127.0.0.1\"]},\"ContentLength\":0,\"TransferEncoding\":null,\"Host\":\"api.localhost.local\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"127.0.0.1:57326\",\"RequestURI\":\"/api/authz\",\"TLS\":null}"
time="2020-07-10T04:37:24Z" level=debug msg="vulcand/oxy/roundrobin/rr: Forwarding this request to URL" ForwardURL="http://172.102.0.7:80" Request="{\"Method\":\"GET\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/api/authz\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"\",\"Fragment\":\"\"},\"Proto\":\"HTTP/2.0\",\"ProtoMajor\":2,\"ProtoMinor\":0,\"Header\":{\"Accept\":[\"*/*\"],\"Accept-Encoding\":[\"gzip\"],\"User-Agent\":[\"curl/7.58.0\"],\"X-Forwarded-Host\":[\"api.localhost.local\"],\"X-Forwarded-Port\":[\"443\"],\"X-Forwarded-Prefix\":[\"/auth\"],\"X-Forwarded-Proto\":[\"https\"],\"X-Forwarded-Server\":[\"ba6a3f1a59f7\"],\"X-Real-Ip\":[\"127.0.0.1\"]},\"ContentLength\":0,\"TransferEncoding\":null,\"Host\":\"api.localhost.local\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"127.0.0.1:57326\",\"RequestURI\":\"/api/authz\",\"TLS\":null}"
time="2020-07-10T04:37:24Z" level=debug msg="vulcand/oxy/roundrobin/rr: completed ServeHttp on request" Request="{\"Method\":\"GET\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/api/authz\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"\",\"Fragment\":\"\"},\"Proto\":\"HTTP/2.0\",\"ProtoMajor\":2,\"ProtoMinor\":0,\"Header\":{\"Accept\":[\"*/*\"],\"Accept-Encoding\":[\"gzip\"],\"User-Agent\":[\"curl/7.58.0\"],\"X-Forwarded-Host\":[\"api.localhost.local\"],\"X-Forwarded-Port\":[\"443\"],\"X-Forwarded-Prefix\":[\"/auth\"],\"X-Forwarded-Proto\":[\"https\"],\"X-Forwarded-Server\":[\"ba6a3f1a59f7\"],\"X-Real-Ip\":[\"127.0.0.1\"]},\"ContentLength\":0,\"TransferEncoding\":null,\"Host\":\"api.localhost.local\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"127.0.0.1:57326\",\"RequestURI\":\"/api/authz\",\"TLS\":null}"
127.0.0.1 - - [10/Jul/2020:04:37:24 +0000] "GET /auth/authz HTTP/2.0" 401 36 "-" "-" 177 "autsi@file" "http://172.102.0.7:80" 1ms
time="2020-07-10T04:37:24Z" level=debug msg="Remote error https://api.localhost.local/auth/authz. StatusCode: 401" middlewareType=ForwardedAuthType middlewareName=auth-request@file
172.102.0.1 - - [10/Jul/2020:04:37:24 +0000] "GET /api/authz HTTP/2.0" 401 36 "-" "-" 176 "portal-backend-https-auth@file" "-" 2ms

So routing works when using curl.

Could you share your logs? and a reproducible example?

Have you read the migration guide? https://docs.traefik.io/v2.2/migration/v2/#v2x-to-v222