istio-operator: Significant performance degrade (ingress pods) with Istio 1.1

Describe the bug Ingress latency for HTTPS increases dramatically that it will actually cause timeouts when trying to establish a TLS handshake with the ingress pods.

Once the TLS handshake has been established with the ingress pods, the process speed to the VirtualService and the actual Service are pretty fast.

Steps to reproduce the issue: On a Kubernetes cluster in AWS (we create our cluster from kubeadm so not using EKS),

Install Istio through their officially supported Helm installation (v1.0.7)
Create a Gateway to attach to the generated ingress load balancer and update the DNS to point to the ELB
Create a cert as secret and attach to the Gateway created in 2 for TLS.
Deploy the sample nginx deployment in kubernetes docs with a VirtualService
Try running curl -v https://<url-from-4> a couple times or use something like lolcat

Expected behavior Networking behavior shouldn’t change including latency to the ingress pods regardless if it’s TLS or not.

Screenshots This is copied from an actual output when I was running curl. It takes like 5 ~ 10 seconds to even get the initial response from the ingress and it’s slow enough that I can actually terminate the request before the handshake even starts.

❯ curl -v https://somewebsite.com/health
*   Trying 52.27.17.16...
* TCP_NODELAY set
* Connected to somewebsite.entelo.com (52.27.17.16) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
^C

Additional context branch: release-1.0 istio: v1.0.7 kubernetes: v1.14.1 not EKS certs: generated from Let's Encrypt with cert-manager

We were trying to switch to the 1.0 Istio operator before doing an upgrade but this issue caused a cluster outage and we had to revert back to the Helm installation. I tried digging around looking for the diffs between the Helm installation and the operator and there literally aren’t much that seems to be different except for the stdio and stdiotcp rules the Operator adds to Mixer.

Our Istio installation and usage are fairly simple and the settings are very close to the default.

Also, reverting back to using the Helm installation immediately resolves all networking issues as well so there are definitely something going on when using the operator.

Trying a fresh install of Istio with the operator also seems to cause the same issues as well.

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 15 (6 by maintainers)

Most upvoted comments

Hey @waynz0r @matyix , after spending some more time tweaking around, looks like there are something interesting going on with the istio-ingressgateway deployment itself.

The interesting part is there aren’t any logs or signals that stands out and I only spot the performance difference when changing the number of pods running.

I’ve created an issue on Istio https://github.com/istio/istio/issues/14102 reporting a bit of my findings, so I’m going to close this issue now. Thanks for the swift response guys. 😃

darwin67 on May 15, 2019