istio: istio-ingressgateway randomly doesn't return certificate for request after 1.3.x upgrade

Bug description After upgrading to Istio 1.3.1 (also occurs on 1.3.0) we will get random SSL connection errors to resources behind the ingressgateway. Through Cloudflare, these are returned to the user as 525 errors, but going straight to the ELB with curl the error returned by curl is curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to httpbin.test:443

Our clusters are hosted on EKS, and the issue can easily be recreated.

Please see the following asciinema for a complete recreation of the issue. Curl request 170 is where the error occurs. https://asciinema.org/a/qMBH6IBOjma6vqLXNNgoqYdyr

The following gist is used to deploy httpbin and other resources required: https://gist.githubusercontent.com/denniswebb/23957c9bf11fa9fa856fe2b8f0557ab1/raw/098eb9e7cf2999445e94470ea8c59578a6b3e7a3/istio-libressl.yaml

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure [ ] Docs [ ] Installation [X] Networking [ ] Performance and Scalability [ ] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure

Expected behavior All requests to the ingressgateway should return a valid certificate information.

Steps to reproduce the bug

  1. Create a new eks cluster
  2. Install istio 1.3.1 using the value gateways.istio-ingressgateway.sds.enabled=true
  3. Expose any service using a certificate through the ingressgateway
  4. Continuous curl requests until the LibreSSL error message is returned.

Version (include the output of istioctl version --remote and kubectl version)

istioctl version --remote
client version: unknown
control plane version: 1.3.1

kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-19T13:57:45Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-eks-5ac0f1", GitCommit:"5ac0f1d9ab2c254ea2b0ce3534fd72932094c6e1", GitTreeState:"clean", BuildDate:"2019-08-20T22:39:46Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed? helm install istio.io/istio --name istio --namespace istio-system --set-string gateways.istio-ingressgateway.sds.enabled=true

Environment where bug was observed (cloud vendor, OS, etc) AWS EKS

Please see the above video and gist to view/reproduce issue. I can 100% confirm we never experienced this issue on 1.2.2 releases.

I can not reproduce locally using docker-desktop on Mac, but I’m not 100% convinced given enough requests and time it could happen.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 28 (14 by maintainers)

Most upvoted comments

I just tested, and installing with --set-string global.proxy.protocolDetectionTimeout=500s fixed the issue.

I then tested with that Timeout set to just 5ms, and got the LibreSSL error for almost 50% of the calls.

Thanks @duderino for the answer.

Should I leave this open until a fix has been implemented for default installs?

I’m seeing this without SDS. Roughly 1 in 1000 POST and PUT calls fail with all releases of Istio 1.3, including 1.3.3. javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake We don’t have this issue with Istio 1.1.

I’ll confirm that we are using cert-manager w/ SDS ourselves as well.