istio: HTTPS outgoing requests blocked on fresh installation helm chart >= `1.3.0` because of outbound protocol sniffing
HTTPS outgoing requests blocked on fresh installation helm chart >= 1.3.0
We used the default values from the helm chart.
Assumes ALLOW_ANY mode and enableProtocolSniffingForOutbound: true
Platform: AWS
Installation: helm chart version 1.3.0
Kubernetes: 1.14.7
with kops, calico
networking
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure [ ] Docs [x] Installation [x] Networking [ ] Performance and Scalability [x] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure
Expected behavior
Steps to reproduce the bug
kubectl create ns example
kubectl label namespace example istio-injection=enabled --overwrite=true
kubectl -n example run -i --tty alpine --rm --image=alpine --restart=Never -- sh
apk add curl
curl https://google.com
Expected
curl https://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
Exists
curl https://google.com
curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number
How to fix
Set enableProtocolSniffingForOutbound: false
helm install istio.io/istio --namespace istio-system --version 1.3.0 --set pilot.enableProtocolSniffingForOutbound=false
Version (include the output of istioctl version --remote
and kubectl version
)
istio 1.3.0
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.7", GitCommit:"8fca2ec50a6133511b771a11559e24191b1aa2b4", GitTreeState:"clean", BuildDate:"2019-09-18T14:47:22Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
How was Istio installed?
helm repo add istio.io https://storage.googleapis.com/istio-release/releases/1.3.0/charts
helm install istio.io/istio-init --namespace istio-system --version 1.3.0
helm install istio.io/istio --namespace istio-system --version 1.3.0
Environment where bug was observed (cloud vendor, OS, etc)
AWS
with kops 1.14.0
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 6
- Comments: 17 (5 by maintainers)
We got same issue on 1.3.3. Downgraded to 1.2.6 works fine…
First, let me say thank you for the many responses and the recommendation for the narrowly scoped
networking.istio.io/v1alpha3:Sidecar
@howardjohn; these have “resolved” the issue I’m about to describe, but I still think it’s worth noting.I wasn’t sure if here, #22150 (since MySQL and
postgres
are siblings here with their own wire protocols), #16458 or #20703 were the right place. As far as I can tell, the core driver of all of these “protocol sniffing snafu” issues is the creation of Envoy listener(s) on the broadcast IP (0.0.0.0
) for a given port; listeners that also happen to incorrectly match some protocol (e.g. assuming cleartext HTTP and missingtls_inspector
or getting raw TCP wire protocols likepostgres
ormysql
).I am currently in the process of rolling out Istio to an established set of clusters that my team manages. The “protocol sniffing snafu” is a bit concerning to us and we are trying to mitigate so that we can guarantee our mesh won’t become unusable if a new Kubernetes resource (that we didn’t account for) joins our cluster.
When upgrading Istio recently; services that were already smoothly running in the mesh failed to startup. The culprit: an Envoy listener on
0.0.0.0:5432
(and these services refusing to start up if they couldn’t reach an RDS instance outside the cluster). The upgrade was from 1.4.3 to 1.5.0 (I know the protocol detection issue started occurring in the 1.3.x series), but I suspect something small may have changed between 1.4.3 and 1.5.0.It seems the
0.0.0.0:5432
listener was due to a service port in thedefault
namespace and not due to intrinsic naming of that service port (it was namedpostgres
). As I mentioned, this issue did not occur in Istio 1.4.3 even though that service —postgres.default
— was present when we were running 1.4.3.UPDATE: As I took some time to reproduce this after writing, I realize now that the
default
namespace is not the real differentiator from the other services (which is a good thing). Instead it’s the fact that the service has no cluster IPSo it seems a Kubernetes bug (i.e. the absence of a CLUSTER-IP for this long since forgotten service of type
ClusterIP
) is causing a bug in the Envoy listener creation.Some notes about how I concluded it was the port in the
default
namespace:./*
).Sidecar
to limit hosts toistio-system/*
and./*
, the issue went away (and quite quickly, it’s very impressive how quickly updates propagate from Pilot /istiod
).${CLUSTER_IP}:5432
listener, i.e. not0.0.0.0
, verified viaistioctl proxy-config listeners --port 5432
.Sidecar
was applied had (at debugging time) 31 service ports running 5432. 30 of those ports had no name, 1 port was namedpgsql
.kubectl delete
-ing theSidecar
, the issue immediately resurfaced.kubectl edit
on the offending service (postgres.default
) to change service port 5432’s name frompostgres
totcp-postgres
and it also immediately resolved the issue. Somewhat surprisingly, after the0.0.0.0:5432
listener disappeared, no equivalent${CLUSTER_IP}:5432
listener popped up forpostgres.default
.postgres.acme-system
(replaceacme
with our company name) service that exposed 5432 namedpostgres
(i.e. identical to the port name and number from thepostgres.default
service) but this only produced a${CLUSTER_IP}:5432
listener.istio-system/*
and./*
that exposed port 5432 with namespgsql
andpostgresql
; both of these produced a produced a${CLUSTER_IP}:5432
listener.As an additional mitigation to using a
networking.istio.io/v1alpha3:Sidecar
we are going to require all service ports in our namespaces in the Istio service mesh to use manual protocol selection. Once all ports adhere to this rule, it will be enforced for new services with a service admission webhook.I did a bit more digging to get the
0.0.0.0:5432
listener. It is essentially identical to all of the other5432
, except for an extra filter chain at the beginning oflisteners[34]["filterChains"]
pointing atBlackHoleCluster
and matching the pod IP10.101.175.16
of the${POD_NAME}
I was getting listeners from:Additionally in
listeners[34]["filterChains"][2]["filters"]
, the filter at index0
(which is theenvoy.http_connection_manager
filter) hastypedConfig.rds.routeConfigName
equal to5432
, though I expected it to bepostgres.default.svc.cluster.local:5432
(based on the other listeners).(Also, small world, @louiscryan was on my interview panel at Google back in 2011. He stumped me with a 2D binary search where
x
andy
were independent.)