istio: Websocket traffic fails with 1.7.4 gateway and 1.7.5 sidecar proxy

Bug description

I am less concerned about what specifically broke here than I am about preventing this from happening in the future. But I think we have to understand why it broke in order to prevent it in the future.

I have a gateway at 1.7.4, which takes incoming websocket traffic and routes it to a pod with a 1.7.5 sidecar. The connection starts out as https, but then gets upgraded to websocket with a 101 Switching Protocols - this is done via the socket.io library. All attempts to upgrade the connection fail with a 503 UC. This works fine if I have a 1.7.4 gateway and a 1.7.4 sidecar, and similarly with a 1.7.5 gateway and sidecar. The problem only happens with 1.7.4 gateway and 1.7.5 sidecar. I’ve already upgraded the gateway and this is not a problem for me anymore, but I need to know that it won’t happen again next upgrade.

I have a gateway defined as https://github.com/rvennam/istio-on-iks/blob/master/custom-istio-ingress-1.7.md. Here’s the simplified gateway resource:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: mygateway
spec:
  selector:
    istio: custom-ingressgateway
  servers:
  - hosts:
    - mydomain.com
    port:
      name: https_mydomain.com
      number: 443
      protocol: HTTPS
    tls:
      credentialName: mydomain.com
      mode: SIMPLE

and the simplified virtual service:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  gateways:
  - mygateway
  hosts:
  - myapp.mydomain.com
  http:
  - route:
    - destination:
        host: myapp

I make an http request to the cluster requesting a websocket, and I get a 503 response from the gateway. This is what I see in my log with request logging enabled:

Dec 4 15:26:20 custom-ingressgateway-867f76b987-2q46l istio-proxy "GET /devops/socket.io/?toolchain=057154e6-3c50-4dd8-9388-857a3f2092ad&sessionId=***&time=1607113570527&env_id=ibm%3Ays1%3Aus-south&EIO=3&transport=websocket HTTP/1.1" 503 UC "-" "-" 0 95 3 - "***,***" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36" "b7377ed6-7e10-4c39-95c1-372f64b8ef08" "mydomain.com" "***:8080" outbound|80||myapp.mynamespace.svc.cluster.local ***:41394 ***:8443 ***:50068 mydomain.com -

Dec 4 15:26:24 myapp-5f5f8f587b-kdjz5 istio-proxy "GET /devops/socket.io/?toolchain=057154e6-3c50-4dd8-9388-857a3f2092ad&sessionId=***&time=1607113570527&env_id=ibm%3Ays1%3Aus-south&EIO=3&transport=websocket HTTP/1.1" 101 - "-" "-" 0 87 2 - "***,***" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36" "b7377ed6-7e10-4c39-95c1-372f64b8ef08" "mydomain.com" "127.0.0.1:8080" inbound|80|http|myapp.mynamespace.svc.cluster.local 127.0.0.1:38808 ***:8080 ***:0 outbound_.80_._.myapp.mynamespace.svc.cluster.local default

We can see that the request makes it to my pod, and it responds as usual with 101. But the gateway response is 503 UC. This happens consistently (it happened with 2 different apps deployed to 21 clusters).

The reason for this version mismatch is that I’m using the IBM Cloud Istio addon which automatically upgrades the control plane, but a custom gateway that I upgrade manually. However since the control plane auto updates, this means automatic sidecar injection will update the sidecar versions before the gateway in some cases.

[ ] Docs [ ] Installation [x] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure [x] Upgrade

Expected behavior

Compatibility is maintained across gateways and sidecars of the same minor version.

Steps to reproduce the bug

See description above.

Version (include the output of istioctl version --remote and kubectl version --short and helm version --short if you used Helm)

$ istioctl version --remote
client version: 1.7.5
control plane version: 1.7.5
data plane version: 1.7.5 (100 proxies), 1.7.4 (12 proxies)

$ kubectl version --short
Client Version: v1.17.14
Server Version: v1.17.14+IKS

How was Istio installed?

IBM Cloud Istio addon

Environment where the bug was observed (cloud vendor, OS, etc)

IBM Cloud

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 32 (20 by maintainers)

Most upvoted comments

I had websocket issues when switching from Istio 1.3 to 1.8. When I changed the Service Port name from http to tcp issues disappeared.