istio: istio-ingressgateway randomly doesn't return certificate for request after 1.3.x upgrade
Bug description After upgrading to Istio 1.3.1 (also occurs on 1.3.0) we will get random SSL connection errors to resources behind the ingressgateway. Through Cloudflare, these are returned to the user as 525 errors, but going straight to the ELB with curl the error returned by curl is curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to httpbin.test:443
Our clusters are hosted on EKS, and the issue can easily be recreated.
Please see the following asciinema for a complete recreation of the issue. Curl request 170 is where the error occurs. https://asciinema.org/a/qMBH6IBOjma6vqLXNNgoqYdyr
The following gist is used to deploy httpbin and other resources required: https://gist.githubusercontent.com/denniswebb/23957c9bf11fa9fa856fe2b8f0557ab1/raw/098eb9e7cf2999445e94470ea8c59578a6b3e7a3/istio-libressl.yaml
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure [ ] Docs [ ] Installation [X] Networking [ ] Performance and Scalability [ ] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure
Expected behavior All requests to the ingressgateway should return a valid certificate information.
Steps to reproduce the bug
- Create a new eks cluster
- Install istio 1.3.1 using the value
gateways.istio-ingressgateway.sds.enabled=true
- Expose any service using a certificate through the ingressgateway
- Continuous curl requests until the LibreSSL error message is returned.
Version (include the output of istioctl version --remote
and kubectl version
)
istioctl version --remote
client version: unknown
control plane version: 1.3.1
kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-19T13:57:45Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-eks-5ac0f1", GitCommit:"5ac0f1d9ab2c254ea2b0ce3534fd72932094c6e1", GitTreeState:"clean", BuildDate:"2019-08-20T22:39:46Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
How was Istio installed?
helm install istio.io/istio --name istio --namespace istio-system --set-string gateways.istio-ingressgateway.sds.enabled=true
Environment where bug was observed (cloud vendor, OS, etc) AWS EKS
Please see the above video and gist to view/reproduce issue. I can 100% confirm we never experienced this issue on 1.2.2 releases.
I can not reproduce locally using docker-desktop on Mac, but I’m not 100% convinced given enough requests and time it could happen.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 28 (14 by maintainers)
I just tested, and installing with
--set-string global.proxy.protocolDetectionTimeout=500s
fixed the issue.I then tested with that Timeout set to just 5ms, and got the LibreSSL error for almost 50% of the calls.
Thanks @duderino for the answer.
Should I leave this open until a fix has been implemented for default installs?
I’m seeing this without SDS. Roughly 1 in 1000 POST and PUT calls fail with all releases of Istio 1.3, including 1.3.3.
javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
We don’t have this issue with Istio 1.1.I’ll confirm that we are using cert-manager w/ SDS ourselves as well.