istio: Mutual TLS Origination with Egress Gateway Task fails

Bug description Following the task for “Perform mututal TLS origination with an Egress Gateway” seems to result in an invalid configuration ~~that doesn’t achieve the desired outcome~~. I’m referencing this task: Perform Mutual TLS Origination with an Egress Gateway

The config documented in the task resulted in the cluster configs for the istio-ingressgateway and sleep pods to go into STALE status because Envoy believes the pods need TLS certificates, specified by a destination rule in the documentation, that they shouldn’t require and don’t have presented to them in their pod configs.

This seems to affect both 1.4.x and 1.3.x (using the archived task documentation, which is exactly the same outside of using Helm to deploy the egress gateway instead of istioctl).

[ ] Configuration Infrastructure [ X ] Docs [ ] Installation [ X ] Networking [ ] Performance and Scalability [ ] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure

Expected behavior I’d expect istioctl proxy-status to show synced configuration for all sidecars in the mesh after the configuration in the task documentation is applied.

Steps to reproduce the bug Follow the procedure in the documentation for Perform Mutual TLS Origination with an Egress Gateway.

When you run the command in step 3 in the section titled “Configure mutual TLS origination for egress traffic”, the command returns a no healthy upstream error.

Additionally, when you run istioctl proxy-status you’ll see that the sleep and istio-ingressgateway pods are reporting stale cluster configurations. If you look at the Envoy logs for those pods, you find that the cluster config is failing to apply because of missing certs. Going a bit further, a istio proxy-status run on the affected pods shows the following config diffs between Pilot and Envoy:

+++ Envoy Clusters
@@ -3094,44 +3094,17 @@
             "circuitBreakers": {
                "thresholds": [
                   {
                      "maxRetries": 1024
                   }
                ]
             },
-            "tlsContext": {
-               "commonTlsContext": {
-                  "tlsCertificates": [
-                     {
-                        "certificateChain": {
-                           "filename": "/etc/nginx-client-certs/tls.crt"
-                        },
-                        "privateKey": {
-                           "filename": "/etc/nginx-client-certs/tls.key"
-                        }
-                     }
-                  ],
-                  "validationContext": {
-                     "trustedCa": {
-                        "filename": "/etc/nginx-ca-certs/ca-chain.cert.pem"
-                     }
-                  }
-               },
-               "sni": "nginx.example.com"
-            },
             "dnsRefreshRate": "300s",
             "respectDnsTtl": true,
-            "dnsLookupFamily": "V4_ONLY",
-            "metadata": {
-               "filterMetadata": {
-                  "istio": {
-                        "config": "/apis/networking/v1alpha3/namespaces/default/destination-rule/originate-mtls-for-nginx"
-                     }
-               }
-            }
+            "dnsLookupFamily": "V4_ONLY"
          }
       },
       {
          "cluster": {
             "name": "outbound|443||traffic-claim-enforcer-webhook.istio-system.svc.cluster.local",
             "type": "EDS",
             "edsClusterConfig": {

Listeners Match
Routes Match

NAME                                                  CDS        LDS        EDS        RDS          PILOT                            VERSION
istio-egressgateway-fd99fdc54-pvdnw.istio-system      SYNCED     SYNCED     SYNCED     SYNCED       istio-pilot-58f7cc886f-q7hj7     1.3.6
istio-ingressgateway-777d8db49-fsf9v.istio-system     STALE     SYNCED     SYNCED     NOT SENT     istio-pilot-58f7cc886f-q7hj7     1.3.6
sleep-78777bf478-j5qlv.default                        STALE     SYNCED     SYNCED     SYNCED       istio-pilot-58f7cc886f-q7hj7     1.3.6

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm)

Istio:

citadel version: 1.3.6
egressgateway version: 1.3.6
galley version: 1.3.6
ingressgateway version: 1.3.6
pilot version: 1.3.6
policy version: 1.3.6
sidecar-injector version: 1.3.6
telemetry version: 1.3.6

Kubernetes:

Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.7", GitCommit:"6c143d35bb11d74970e7bc0b6c45b6bfdffc0bd4", GitTreeState:"clean", BuildDate:"2019-12-11T12:34:17Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Helm:

How was Istio installed? Installed using Helm 3

Environment where bug was observed (cloud vendor, OS, etc) kops cluster running on AWS EC2 instances

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 18 (2 by maintainers)

Most upvoted comments

Does curl work for you when connecting to http:80 endpoint?

Can you share your full configuration? ServiceEntry, VirtualService, DestinationRule, Gateway?

I’ve seen similar issue on my side when traffic exited cluster on side-car rather than Egress Gateway due to broken config.

AlbertasG on Mar 12, 2020

Make sure you can curl external service from egress gateway pod when providing CA/Cert/Key directly.

This will confirm that certificates are loaded correctly. Next you could be dumping tcp traffic on egress gateway and istio-sidecar to check if traffic goes through.

On Tue, 10 Mar 2020, 22:29 Razeel Mohammed, notifications@github.com wrote:

Same issue! @AlbertasG https://github.com/AlbertasG . In my case I am calling an external service. I was also referring the same documentation.

below is my destination rule.

apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule … trafficPolicy: portLevelSettings:

port: number: 443 loadBalancer: simple: ROUND_ROBIN tls: caCertificates: /etc/com-ca-certs/ clientCertificate: /etc/com-client-certs/tls.crt mode: MUTUAL privateKey: /etc/com-client-certs/tls.key sni: my domain here

even though the certificates are provided the result ended up as below (Please note: the certificates are mounted and available in the path)

ALPN, offering h2

ALPN, offering http/1.1

successfully set certificate verify locations:

CAfile: /etc/ssl/certs/ca-certificates.crt CApath: none

TLSv1.3 (OUT), TLS handshake, Client hello (1):

TLSv1.3 (IN), TLS handshake, Server hello (2):

TLSv1.2 (IN), TLS handshake, Certificate (11):

TLSv1.2 (OUT), TLS alert, unknown CA (560):

SSL certificate problem: unable to get local issuer certificate

Did anyone actually got this working??

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/istio/istio/issues/20856?email_source=notifications&email_token=AGRFSEHMVKWAOD5ET4YK2I3RG2PLXA5CNFSM4KP6V5T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEONA5DI#issuecomment-597298829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGRFSEBUXPWDG7ITGWUSOULRG2PLXANCNFSM4KP6V5TQ .

AlbertasG on Mar 10, 2020

@rodalli

So previously when i tested the “outside of Cluster” nginx it was 3rd party service with mTLS. I’ve tried to follow the steps with nginx mTLS on GCP separate compute instance and it worked fine.

I still see that CDS are STALE, but RDS are SYNCED for sleep pods.

istio-egressgateway-5d5ffb4df-9mt8p.istio-system       SYNCED                         SYNCED     SYNCED (95%)     SYNCED       istio-pilot-5889bbb5c5-ns28v     1.2.10
istio-ingressgateway-7d6f7d85df-jrtnz.istio-system     STALE (Never Acknowledged)     SYNCED     SYNCED (95%)     NOT SENT     istio-pilot-5889bbb5c5-ns28v     1.2.10-gke.0
sleep-74f6459479-jt6qg.default                         STALE (Never Acknowledged)     SYNCED     SYNCED (49%)     SYNCED       istio-pilot-5889bbb5c5-ns28v     1.2.10-gke.0
sleep2-88d94fb74-zzs9m.default                         STALE (Never Acknowledged)     SYNCED     SYNCED (49%)     SYNCED       istio-pilot-5889bbb5c5-ns28v     1.2.10-gke.0

I tried to adjust the same config to 3rd party service again, and it did not work.

First hidden issue:

sleep container from example was using OpenSSL 1.0.1 and there was a problem with protocol negotiation from curl part. Upgrading to other container resolved issue that i was able to curl while providing certificates/keys manually sleep => istio-proxy

Second hidden issue:

Egress gateway mTLS origination was still not working, what resolved the issue for me was changing specific file path of CACert to absolute directory path in DestinationRule caCertificates

  name: originate-mtls-for-nginx
  ...
  trafficPolicy:
    loadBalancer:
      simple: ROUND_ROBIN
    portLevelSettings:
    - port:
        number: 443
      tls:
        caCertificates: /etc/nginx-ca-certs/
        clientCertificate: /etc/nginx-client-certs/tls.crt
        mode: MUTUAL
        privateKey: /etc/nginx-client-certs/tls.key
        sni: some.sni.example.com

I don’t know if this will be applicable to your setup, but hope it helps.

Another thing to mention i used example for my exact version from documentation archive. Check if example between latest 1.4 and your version differs.

As well, if you’ll try to debug from egress-gateway or istio-proxy container itself with curl note it uses GnuTLS/3.4.10

AlbertasG on Feb 20, 2020

At least for for now it doesn’t work. I’ll try to replicate this again next week and post the results.

On Mon, 10 Feb 2020, 23:00 Shriram Rajagopalan, notifications@github.com wrote:

Ignoring the error, does the communication work? If it does, then you can continue to ignore that error.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/istio/istio/issues/20856?email_source=notifications&email_token=AGRFSEEUOEBZLBHWP4IM5PTRCG55JA5CNFSM4KP6V5T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELKH47A#issuecomment-584351356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGRFSEENSKTP6V4HLHKF263RCG55JANCNFSM4KP6V5TQ .

AlbertasG on Feb 10, 2020

I have the exactly same issue, mTLS example with mesh-external.nginx works perfectly, but trying to replicate the same thing when nginx is outside the cluster (Public IP) it fails.

Thanks for adding your feedback @AlbertasG, glad to know I’m not alone! I’m not sure about the public IP scenario as I haven’t tested it (will do so when able), but it definitely results in an invalid cluster config (unable to be applied by Envoy) for the client service when implemented with an in-cluster (but mesh external) service as the destination, which is what’s used in the docs.

It definitely is that last destination rule that causes the cluster sync issue, as that’s what specifies the TLS config that the sleep pod is unable to apply.

medavisjr on Feb 7, 2020