istio: HTTPS outgoing requests blocked on fresh installation helm chart >= `1.3.0` because of outbound protocol sniffing
HTTPS outgoing requests blocked on fresh installation helm chart >= 1.3.0
We used the default values from the helm chart.
Assumes ALLOW_ANY mode and enableProtocolSniffingForOutbound: true
Platform: AWS
Installation: helm chart version 1.3.0
Kubernetes: 1.14.7 with kops, calico networking
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure [ ] Docs [x] Installation [x] Networking [ ] Performance and Scalability [x] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure
Expected behavior
Steps to reproduce the bug
kubectl create ns example
kubectl label namespace example istio-injection=enabled --overwrite=true
kubectl -n example run -i --tty alpine --rm --image=alpine --restart=Never -- sh
apk add curl
curl https://google.com
Expected
curl https://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
Exists
curl https://google.com
curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number
How to fix
Set enableProtocolSniffingForOutbound: false
helm install istio.io/istio --namespace istio-system --version 1.3.0 --set pilot.enableProtocolSniffingForOutbound=false
Version (include the output of istioctl version --remote and kubectl version)
istio 1.3.0
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.7", GitCommit:"8fca2ec50a6133511b771a11559e24191b1aa2b4", GitTreeState:"clean", BuildDate:"2019-09-18T14:47:22Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
How was Istio installed?
helm repo add istio.io https://storage.googleapis.com/istio-release/releases/1.3.0/charts
helm install istio.io/istio-init --namespace istio-system --version 1.3.0
helm install istio.io/istio --namespace istio-system --version 1.3.0
Environment where bug was observed (cloud vendor, OS, etc)
AWS with kops 1.14.0
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 6
- Comments: 17 (5 by maintainers)
We got same issue on 1.3.3. Downgraded to 1.2.6 works fine…
First, let me say thank you for the many responses and the recommendation for the narrowly scoped
networking.istio.io/v1alpha3:Sidecar@howardjohn; these have “resolved” the issue I’m about to describe, but I still think it’s worth noting.I wasn’t sure if here, #22150 (since MySQL and
postgresare siblings here with their own wire protocols), #16458 or #20703 were the right place. As far as I can tell, the core driver of all of these “protocol sniffing snafu” issues is the creation of Envoy listener(s) on the broadcast IP (0.0.0.0) for a given port; listeners that also happen to incorrectly match some protocol (e.g. assuming cleartext HTTP and missingtls_inspectoror getting raw TCP wire protocols likepostgresormysql).I am currently in the process of rolling out Istio to an established set of clusters that my team manages. The “protocol sniffing snafu” is a bit concerning to us and we are trying to mitigate so that we can guarantee our mesh won’t become unusable if a new Kubernetes resource (that we didn’t account for) joins our cluster.
When upgrading Istio recently; services that were already smoothly running in the mesh failed to startup. The culprit: an Envoy listener on
0.0.0.0:5432(and these services refusing to start up if they couldn’t reach an RDS instance outside the cluster). The upgrade was from 1.4.3 to 1.5.0 (I know the protocol detection issue started occurring in the 1.3.x series), but I suspect something small may have changed between 1.4.3 and 1.5.0.It seems the
0.0.0.0:5432listener was due to a service port in thedefaultnamespace and not due to intrinsic naming of that service port (it was namedpostgres). As I mentioned, this issue did not occur in Istio 1.4.3 even though that service —postgres.default— was present when we were running 1.4.3.UPDATE: As I took some time to reproduce this after writing, I realize now that the
defaultnamespace is not the real differentiator from the other services (which is a good thing). Instead it’s the fact that the service has no cluster IPSo it seems a Kubernetes bug (i.e. the absence of a CLUSTER-IP for this long since forgotten service of type
ClusterIP) is causing a bug in the Envoy listener creation.Some notes about how I concluded it was the port in the
defaultnamespace:./*).Sidecarto limit hosts toistio-system/*and./*, the issue went away (and quite quickly, it’s very impressive how quickly updates propagate from Pilot /istiod).${CLUSTER_IP}:5432listener, i.e. not0.0.0.0, verified viaistioctl proxy-config listeners --port 5432.Sidecarwas applied had (at debugging time) 31 service ports running 5432. 30 of those ports had no name, 1 port was namedpgsql.kubectl delete-ing theSidecar, the issue immediately resurfaced.kubectl editon the offending service (postgres.default) to change service port 5432’s name frompostgrestotcp-postgresand it also immediately resolved the issue. Somewhat surprisingly, after the0.0.0.0:5432listener disappeared, no equivalent${CLUSTER_IP}:5432listener popped up forpostgres.default.postgres.acme-system(replaceacmewith our company name) service that exposed 5432 namedpostgres(i.e. identical to the port name and number from thepostgres.defaultservice) but this only produced a${CLUSTER_IP}:5432listener.istio-system/*and./*that exposed port 5432 with namespgsqlandpostgresql; both of these produced a produced a${CLUSTER_IP}:5432listener.As an additional mitigation to using a
networking.istio.io/v1alpha3:Sidecarwe are going to require all service ports in our namespaces in the Istio service mesh to use manual protocol selection. Once all ports adhere to this rule, it will be enforced for new services with a service admission webhook.I did a bit more digging to get the
0.0.0.0:5432listener. It is essentially identical to all of the other5432, except for an extra filter chain at the beginning oflisteners[34]["filterChains"]pointing atBlackHoleClusterand matching the pod IP10.101.175.16of the${POD_NAME}I was getting listeners from:Additionally in
listeners[34]["filterChains"][2]["filters"], the filter at index0(which is theenvoy.http_connection_managerfilter) hastypedConfig.rds.routeConfigNameequal to5432, though I expected it to bepostgres.default.svc.cluster.local:5432(based on the other listeners).(Also, small world, @louiscryan was on my interview panel at Google back in 2011. He stumped me with a 2D binary search where
xandywere independent.)