istio: Istio does not adhere to HTTP/2 RFC 7540

Bug description

Istio does not properly send a 421 response when a connection is reused and accident sent to a server that is not the correct origin. This can occur when there are two gateways, one with a wildcard certificate (*.example.com) and one with a different non-wildcard certificate (b.example.com) routing to two different apps (a.example.com and b.example.com) where a http/2 connection is first established to the wildcard gateway (on host a.example.com with *.example.com) then a resource is requested from an application on the non-wildcard gateway (b.example.com with certificate b.example.com).

Because of http/2 connection reuse it’s possible for traffic destined for the second app (b.example.com) to end up being routed on the existing connection for (a.example.com) due to the RFC definition of connection re-use in section 9.1.1 (https://tools.ietf.org/html/rfc7540#section-9.1.1, e.g., because the certificate can authoritatively handle the request, and the IP address is the same as they are on the same ingressgateway).

When that happens, according to section 9.1.2 istio should respond with a 421 indicating that the wrong connection was used and the origin was not found. This would instruct the browser to retry on a new connection, thus renegotiating TLS and presenting SNI and thus going down the non-wildcard certificate route and to a different gateway/virtual service and the correct service.

Expected behavior

Should return a 421 and the browser should re-connect and successfully find the resource. Instead a 404 is returned.

Steps to reproduce the bug

  1. Create istio 1.1.1+ instance with one ingress gateway.
  2. Create a DNS record a.example.com, and b.example.com both point to the ingress gateway.
  3. Create a gateway named “a” for a.example.com that uses a.example.com server host, and has a wildcard certificate for *.example.com.
  4. Create a gateway named “b” for b.example.com that uses b.example.com server host and has a specific certificate b.example.com.
  5. Create an app that hosts a static website with two files index.html and foobar.png. The index.html file should have an image tag that refers to an image https://b.example.com/foobar.png (e.g., <img src="https://b.example.com/foobar.png">)
  6. Deploy the app twice to Kubernetes and attach virtual services for a.example.com to go to the app and b.example.com to go to the app (effectively a.example.com and b.example.com are both hosting the app with different certificates on the same IP address, where a is on a wildcard cert and b is not).
  7. Visit https://a.example.com/, notice that you receive a 404 in Chrome and Firefox but not safari or opera.

Version (include the output of istioctl version --remote and kubectl version)

kubectl vclient version: version.BuildInfo{Version:"1.1.1", GitRevision:"2b1331886076df103179e3da5dc9077fed59c989", User:"root", Host:"7077232d-4c6c-11e9-813c-0a580a2c0506", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Clean", GitTag:"1.1.0-17-g2b13318"}
apps-private-ingressgateway version: version.BuildInfo{Version:"", GitRevision:"", User:"", Host:"", GolangVersion:"", DockerHub:"", BuildStatus:"", GitTag:""}
apps-public-ingressgateway version: version.BuildInfo{Version:"", GitRevision:"", User:"", Host:"", GolangVersion:"", DockerHub:"", BuildStatus:"", GitTag:""}
citadel version: version.BuildInfo{Version:"1.1.2", GitRevision:"2b1331886076df103179e3da5dc9077fed59c989-dirty", User:"root", Host:"35adf5bb-5570-11e9-b00d-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.1"}
galley version: version.BuildInfo{Version:"1.1.2", GitRevision:"2b1331886076df103179e3da5dc9077fed59c989-dirty", User:"root", Host:"35adf5bb-5570-11e9-b00d-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.1"}
pilot version: version.BuildInfo{Version:"1.1.2", GitRevision:"2b1331886076df103179e3da5dc9077fed59c989-dirty", User:"root", Host:"35adf5bb-5570-11e9-b00d-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.1"}
policy version: version.BuildInfo{Version:"1.1.2", GitRevision:"2b1331886076df103179e3da5dc9077fed59c989-dirty", User:"root", Host:"35adf5bb-5570-11e9-b00d-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.1"}
sidecar-injector version: version.BuildInfo{Version:"1.1.2", GitRevision:"2b1331886076df103179e3da5dc9077fed59c989-dirty", User:"root", Host:"35adf5bb-5570-11e9-b00d-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.1"}
telemetry version: version.BuildInfo{Version:"1.1.2", GitRevision:"2b1331886076df103179e3da5dc9077fed59c989-dirty", User:"root", Host:"35adf5bb-5570-11e9-b00d-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.1"}
sites-private-ingressgateway version: version.BuildInfo{Version:"", GitRevision:"", User:"", Host:"", GolangVersion:"", DockerHub:"", BuildStatus:"", GitTag:""}
sites-public-ingressgateway version: version.BuildInfo{Version:"", GitRevision:"", User:"", Host:"", GolangVersion:"", DockerHub:"", BuildStatus:"", GitTag:""}

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-18T14:14:00Z", GoVersion:"go1.9.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.5", GitCommit:"753b2dbc622f5cc417845f0ff8a77f539a4213ea", GitTreeState:"clean", BuildDate:"2018-11-26T14:31:35Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed?

Helm

Environment where bug was observed (cloud vendor, OS, etc)

AWS, with istio installed and running with an NLB ingress or as a nodeport terminating the TLS.

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 14
  • Comments: 30 (11 by maintainers)

Most upvoted comments

Thanks for the excellent writeup. I think we’re seeing the same issue with a slightly different setup. (Please tell me if I’m wrong. I’m happy to open a separate case.)

  • In development, we use a single self-signed cert that has multiple domains foo.example.local, bar.example.local, baz.example.local.
  • Requests from the same browser (Chrome 114.0.5735.198) using HTTP/2 sometimes get NR no_route_found from the gateway for routes that should exist. (Restarting the browser, or using a different browser, shows that they do exist.)
  • The 404/no_route_found behavior is “sticky” and requires a browser restart to fix.

Unrelated differences (I think?)

  • We’re using k8s HttpRoute for our routing. But the routes all get sent to the istio gateway-controller, so I think we’re ending up in the same code path(s).

We also noticed that we can not reproduce this behavior in the latest Firefox. Maybe it’s being more conservative and choosing not to re-use http/2 connections across hostnames? (because of just these kinds of server-side errors? Or because it’s simpler not to? 🤷)


This gave us a ton of headaches because we started experiencing this when we were adding and testing CORS headers to allow requests across those domains. Whether that would work really depended on the particular access pattern that might trigger the bad behavior. And we couldn’t see anything wrong in our apps because the gateway was just rejecting the routes before even sending them on to our pods.

I estimate we wasted a cumulative week of developer time (across multiple developers) if not more because of this issue.

Is there nothing to be done that can at least avoid this issue until there’s a fix in envoy? (For example: Maybe the gateway could refuse to reuse HTTP/2 connections on certs that are wildcards and/or serve multiple hostnames? at least until the upstream fix is available?)


Our workaround for now was to create separate certs for each hostname. This is fine in development, but isn’t really tenable (as people say above) if you have a large number of hostnames to deal with, and can’t act as your own CA.