istio: Mutual TLS Origination with Egress Gateway Task fails
Bug description
Following the task for “Perform mututal TLS origination with an Egress Gateway” seems to result in an invalid configuration that doesn’t achieve the desired outcome. I’m referencing this task: Perform Mutual TLS Origination with an Egress Gateway
The config documented in the task resulted in the cluster configs for the istio-ingressgateway and sleep pods to go into STALE status because Envoy believes the pods need TLS certificates, specified by a destination rule in the documentation, that they shouldn’t require and don’t have presented to them in their pod configs.
This seems to affect both 1.4.x and 1.3.x (using the archived task documentation, which is exactly the same outside of using Helm to deploy the egress gateway instead of istioctl).
[ ] Configuration Infrastructure [ X ] Docs [ ] Installation [ X ] Networking [ ] Performance and Scalability [ ] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure
Expected behavior I’d expect istioctl proxy-status to show synced configuration for all sidecars in the mesh after the configuration in the task documentation is applied.
Steps to reproduce the bug Follow the procedure in the documentation for Perform Mutual TLS Origination with an Egress Gateway.
When you run the command in step 3 in the section titled “Configure mutual TLS origination for egress traffic”, the command returns a no healthy upstream
error.
Additionally, when you run istioctl proxy-status
you’ll see that the sleep
and istio-ingressgateway
pods are reporting stale cluster configurations. If you look at the Envoy logs for those pods, you find that the cluster config is failing to apply because of missing certs. Going a bit further, a istio proxy-status
run on the affected pods shows the following config diffs between Pilot and Envoy:
+++ Envoy Clusters
@@ -3094,44 +3094,17 @@
"circuitBreakers": {
"thresholds": [
{
"maxRetries": 1024
}
]
},
- "tlsContext": {
- "commonTlsContext": {
- "tlsCertificates": [
- {
- "certificateChain": {
- "filename": "/etc/nginx-client-certs/tls.crt"
- },
- "privateKey": {
- "filename": "/etc/nginx-client-certs/tls.key"
- }
- }
- ],
- "validationContext": {
- "trustedCa": {
- "filename": "/etc/nginx-ca-certs/ca-chain.cert.pem"
- }
- }
- },
- "sni": "nginx.example.com"
- },
"dnsRefreshRate": "300s",
"respectDnsTtl": true,
- "dnsLookupFamily": "V4_ONLY",
- "metadata": {
- "filterMetadata": {
- "istio": {
- "config": "/apis/networking/v1alpha3/namespaces/default/destination-rule/originate-mtls-for-nginx"
- }
- }
- }
+ "dnsLookupFamily": "V4_ONLY"
}
},
{
"cluster": {
"name": "outbound|443||traffic-claim-enforcer-webhook.istio-system.svc.cluster.local",
"type": "EDS",
"edsClusterConfig": {
Listeners Match
Routes Match
NAME CDS LDS EDS RDS PILOT VERSION
istio-egressgateway-fd99fdc54-pvdnw.istio-system SYNCED SYNCED SYNCED SYNCED istio-pilot-58f7cc886f-q7hj7 1.3.6
istio-ingressgateway-777d8db49-fsf9v.istio-system STALE SYNCED SYNCED NOT SENT istio-pilot-58f7cc886f-q7hj7 1.3.6
sleep-78777bf478-j5qlv.default STALE SYNCED SYNCED SYNCED istio-pilot-58f7cc886f-q7hj7 1.3.6
Version (include the output of istioctl version --remote
and kubectl version
and helm version
if you used Helm)
Istio:
citadel version: 1.3.6
egressgateway version: 1.3.6
galley version: 1.3.6
ingressgateway version: 1.3.6
pilot version: 1.3.6
policy version: 1.3.6
sidecar-injector version: 1.3.6
telemetry version: 1.3.6
Kubernetes:
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.7", GitCommit:"6c143d35bb11d74970e7bc0b6c45b6bfdffc0bd4", GitTreeState:"clean", BuildDate:"2019-12-11T12:34:17Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Helm:
How was Istio installed? Installed using Helm 3
Environment where bug was observed (cloud vendor, OS, etc) kops cluster running on AWS EC2 instances
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (2 by maintainers)
Does curl work for you when connecting to http:80 endpoint?
Can you share your full configuration? ServiceEntry, VirtualService, DestinationRule, Gateway?
I’ve seen similar issue on my side when traffic exited cluster on side-car rather than Egress Gateway due to broken config.
Make sure you can curl external service from egress gateway pod when providing CA/Cert/Key directly.
This will confirm that certificates are loaded correctly. Next you could be dumping tcp traffic on egress gateway and istio-sidecar to check if traffic goes through.
On Tue, 10 Mar 2020, 22:29 Razeel Mohammed, notifications@github.com wrote:
@rodalli
So previously when i tested the “outside of Cluster” nginx it was 3rd party service with mTLS. I’ve tried to follow the steps with nginx mTLS on GCP separate compute instance and it worked fine.
I still see that CDS are STALE, but RDS are SYNCED for sleep pods.
I tried to adjust the same config to 3rd party service again, and it did not work.
First hidden issue:
sleep
container from example was using OpenSSL 1.0.1 and there was a problem with protocol negotiation fromcurl
part. Upgrading to other container resolved issue that i was able to curl while providing certificates/keys manuallysleep
=>istio-proxy
Second hidden issue:
Egress gateway mTLS origination was still not working, what resolved the issue for me was changing specific file path of CACert to absolute directory path in
DestinationRule
caCertificates
I don’t know if this will be applicable to your setup, but hope it helps.
Another thing to mention i used example for my exact version from documentation archive. Check if example between latest 1.4 and your version differs.
As well, if you’ll try to debug from egress-gateway or istio-proxy container itself with
curl
note it usesGnuTLS/3.4.10
At least for for now it doesn’t work. I’ll try to replicate this again next week and post the results.
On Mon, 10 Feb 2020, 23:00 Shriram Rajagopalan, notifications@github.com wrote:
Thanks for adding your feedback @AlbertasG, glad to know I’m not alone! I’m not sure about the public IP scenario as I haven’t tested it (will do so when able), but it definitely results in an invalid cluster config (unable to be applied by Envoy) for the client service when implemented with an in-cluster (but mesh external) service as the destination, which is what’s used in the docs.
It definitely is that last destination rule that causes the cluster sync issue, as that’s what specifies the TLS config that the sleep pod is unable to apply.