istio: Headless services - pod fails mTLS handshake with itself

Describe the bug With mTLS auth enabled, when using <dashed-pod-ip>.my-headless-service.my-namespace.svc.cluster.local endpoints (provided a matching ServiceEntry) - pods successfully make HTTP calls to all pods except themselves.

This appears to be the root cause behind some of our headless services problems.
It is a repeating issue with many gossip-based clusters - e.g. akka, kafka, cassandra.

The provided reproduction does not use StatefulSet to keep it simple, but the same principal applies to all workloads using headless services.

NOTE: there are some open issues around the headless-service topic, but they are either stale or broader than this exact issue.

Expected behavior Pods should be able to call themselves on their headless-generated endpoint.

Steps to reproduce the bug

Install Istio with auth enabled (istio-demo-auth.yaml is fine for this) and run:
kubectl label ns default istio-injection=enabled
Create {ServiceAccount, Service[headless], ServiceEntry, Deployment}:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: curly
  labels:
    app: curly
automountServiceAccountToken: false
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: curly
  name: curly-headless
spec:
  ports:
  - name: http
    port: 8855
    protocol: TCP
  selector:
    app: curly
  clusterIP: None
  publishNotReadyAddresses: true
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: curly
spec:
  hosts:
  - "*.curly-headless.default.svc.cluster.local"
  location: MESH_INTERNAL
  ports:
  - number: 8855
    name: http
    protocol: HTTP
  resolution: NONE
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: curly
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: curly
    spec:
      serviceAccountName: curly
      containers:
      - name: curly
        image: tutum/curl
        command: ["python3", "-m", "http.server", "8855"]
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8855

Run kubectl get pods -o wide and wait for the pods to become ready:

NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE
curly-75649cb6f8-n96tv   2/2     Running   0          1m    172.17.0.17   minikube   <none>
curly-75649cb6f8-qdh77   2/2     Running   0          1m    172.17.0.6    minikube   <none>

Run curl from POD1 to POD2:
In this example, POD1’s generated name would be 172-17-0-17.curly-headless.default.svc.cluster.local (more: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#srv-records)

POD1=$(kubectl get pods -l app=curly -o jsonpath='{.items[0].metadata.name}') && \
POD2_DASHED_IP=$(kubectl get pods -l app=curly -o jsonpath='{.items[1].status.podIP}' | sed 's/\./-/g') && \
kubectl exec $POD1 -c curly -- curl -s -I ${POD2_DASHED_IP}.curly-headless:8855

Output should be something like:

HTTP/1.1 200 OK
server: envoy
date: Sun, 17 Mar 2019 07:58:20 GMT
content-type: text/html; charset=ascii
content-length: 987
x-envoy-upstream-service-time: 1

Run curl from POD1 to POD1:

POD1=$(kubectl get pods -l app=curly -o jsonpath='{.items[0].metadata.name}') && \
POD1_DASHED_IP=$(kubectl get pods -l app=curly -o jsonpath='{.items[0].status.podIP}' | sed 's/\./-/g') && \
kubectl exec $POD1 -c curly -- curl -v -s -I ${POD1_DASHED_IP}.curly-headless:8855

Output:

* Rebuilt URL to: 172-17-0-17.curly-headless:8855/
* Hostname was NOT found in DNS cache
*   Trying 172.17.0.17...
* Connected to 172-17-0-17.curly-headless (172.17.0.17) port 8855 (#0)
> HEAD / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 172-17-0-17.curly-headless:8855
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
command terminated with exit code 56

Some attempt can be made to work around the problem with a Policy that make this specific port mTLS permissive, but then RBAC stops working (as there is no auth info in the call to itself).
When setting connection logging to debug, I see the following in my setup:

# this is pod1->pod2

[2019-03-17 07:32:01.755][24][debug][connection] [external/envoy/source/common/network/connection_impl.cc:639] [C41] connecting to 172.17.0.6:8855
[2019-03-17 07:32:01.755][24][debug][connection] [external/envoy/source/common/network/connection_impl.cc:648] [C41] connection in progress
[2019-03-17 07:32:01.756][24][debug][connection] [external/envoy/source/common/network/connection_impl.cc:517] [C41] connected
[2019-03-17 07:32:01.756][24][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:138] [C41] handshake error: 2
[2019-03-17 07:32:01.760][24][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:127] [C41] handshake complete
[2019-03-17 07:32:01.765][24][debug][connection] [external/envoy/source/common/network/connection_impl.cc:502] [C40] remote close
[2019-03-17 07:32:01.766][24][debug][connection] [external/envoy/source/common/network/connection_impl.cc:183] [C40] closing socket: 0
[2019-03-17T07:32:01.754Z] "HEAD / HTTP/1.1" 200 - "-" 0 0 10 9 "-" "curl/7.35.0" "0a079597-58fe-9be3-adbc-a0abe10d67cb" "172-17-0-6.curly-headless.alice-dev.svc.cluster.local:8855" "172.17.0.6:8855" outbound|8855||*.curly-headless.alice-dev.svc.cluster.local - 172.17.0.6:8855 172.17.0.17:42332 -

# this is pod1->pod1

[2019-03-17 07:32:10.567][24][debug][connection] [external/envoy/source/common/network/connection_impl.cc:101] [C41] closing data_to_write=0 type=1
[2019-03-17 07:32:10.567][24][debug][connection] [external/envoy/source/common/network/connection_impl.cc:183] [C41] closing socket: 1
[2019-03-17 07:32:10.568][24][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:236] [C41] SSL shutdown: rc=0
[2019-03-17 07:32:10.673][25][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:138] [C42] handshake error: 1
[2019-03-17 07:32:10.673][25][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:166] [C42] SSL error: 268435612:SSL routines:OPENSSL_internal:HTTP_REQUEST
[2019-03-17 07:32:10.673][25][debug][connection] [external/envoy/source/common/network/connection_impl.cc:183] [C42] closing socket: 0

Version

[~]$ istioctl version --remote
client version: version.BuildInfo{Version:"1.1.0-rc.6", GitRevision:"82797c0c0649a3f73029b33957ae105260458c6e", User:"root", Host:"22373299-4805-11e9-8dad-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Clean", GitTag:"1.1.0-rc.5-5-g82797c0"}
citadel version: version.BuildInfo{Version:"1.1.0-rc.6", GitRevision:"82797c0c0649a3f73029b33957ae105260458c6e-dirty", User:"root", Host:"22373299-4805-11e9-8dad-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.0-rc.5-5-g82797c0"}
egressgateway version: version.BuildInfo{Version:"1.1.0-rc.6", GitRevision:"82797c0c0649a3f73029b33957ae105260458c6e", User:"root", Host:"22373299-4805-11e9-8dad-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Clean", GitTag:"1.1.0-rc.5-5-g82797c0"}
galley version: version.BuildInfo{Version:"1.1.0-rc.6", GitRevision:"82797c0c0649a3f73029b33957ae105260458c6e-dirty", User:"root", Host:"22373299-4805-11e9-8dad-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.0-rc.5-5-g82797c0"}
ingressgateway version: version.BuildInfo{Version:"1.1.0-rc.6", GitRevision:"82797c0c0649a3f73029b33957ae105260458c6e", User:"root", Host:"22373299-4805-11e9-8dad-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Clean", GitTag:"1.1.0-rc.5-5-g82797c0"}
pilot version: version.BuildInfo{Version:"1.1.0-rc.6", GitRevision:"82797c0c0649a3f73029b33957ae105260458c6e-dirty", User:"root", Host:"22373299-4805-11e9-8dad-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.0-rc.5-5-g82797c0"}
policy version: version.BuildInfo{Version:"1.1.0-rc.6", GitRevision:"82797c0c0649a3f73029b33957ae105260458c6e-dirty", User:"root", Host:"22373299-4805-11e9-8dad-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.0-rc.5-5-g82797c0"}
sidecar-injector version: version.BuildInfo{Version:"1.1.0-rc.6", GitRevision:"82797c0c0649a3f73029b33957ae105260458c6e-dirty", User:"root", Host:"22373299-4805-11e9-8dad-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.0-rc.5-5-g82797c0"}
telemetry version: version.BuildInfo{Version:"1.1.0-rc.6", GitRevision:"82797c0c0649a3f73029b33957ae105260458c6e-dirty", User:"root", Host:"22373299-4805-11e9-8dad-0a580a2c0205", GolangVersion:"go1.10.4", DockerHub:"docker.io/istio", BuildStatus:"Modified", GitTag:"1.1.0-rc.5-5-g82797c0"}
[~]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:37:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.8", GitCommit:"4e209c9383fa00631d124c8adcc011d617339b3c", GitTreeState:"clean", BuildDate:"2019-02-28T18:40:05Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Installation “Quick Start Evaluation Install” with istio-demo-auth.yaml and Minikube.
The same issue appears in a real deployment on AWS with kops (same versions).

Environment Minikube / AWS+kops

Cluster State

Some config dumps for POD1:

# relevant clusters

inbound|8855|http|curly-headless.default.svc.cluster.local::default_priority::max_connections::1024
inbound|8855|http|curly-headless.default.svc.cluster.local::default_priority::max_pending_requests::1024
inbound|8855|http|curly-headless.default.svc.cluster.local::default_priority::max_requests::1024
inbound|8855|http|curly-headless.default.svc.cluster.local::default_priority::max_retries::3
inbound|8855|http|curly-headless.default.svc.cluster.local::high_priority::max_connections::1024
inbound|8855|http|curly-headless.default.svc.cluster.local::high_priority::max_pending_requests::1024
inbound|8855|http|curly-headless.default.svc.cluster.local::high_priority::max_requests::1024
inbound|8855|http|curly-headless.default.svc.cluster.local::high_priority::max_retries::3
inbound|8855|http|curly-headless.default.svc.cluster.local::added_via_api::true
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::cx_active::0
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::cx_connect_fail::0
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::cx_total::0
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::rq_active::0
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::rq_error::0
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::rq_success::0
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::rq_timeout::0
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::rq_total::0
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::health_flags::healthy
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::weight::1
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::region::
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::zone::
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::sub_zone::
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::canary::false
inbound|8855|http|curly-headless.default.svc.cluster.local::127.0.0.1:8855::success_rate::-1

outbound|8855||*.curly-headless.default.svc.cluster.local::default_priority::max_connections::1024
outbound|8855||*.curly-headless.default.svc.cluster.local::default_priority::max_pending_requests::1024
outbound|8855||*.curly-headless.default.svc.cluster.local::default_priority::max_requests::1024
outbound|8855||*.curly-headless.default.svc.cluster.local::default_priority::max_retries::1024
outbound|8855||*.curly-headless.default.svc.cluster.local::high_priority::max_connections::1024
outbound|8855||*.curly-headless.default.svc.cluster.local::high_priority::max_pending_requests::1024
outbound|8855||*.curly-headless.default.svc.cluster.local::high_priority::max_requests::1024
outbound|8855||*.curly-headless.default.svc.cluster.local::high_priority::max_retries::3
outbound|8855||*.curly-headless.default.svc.cluster.local::added_via_api::true

outbound|8855||curly-headless.default.svc.cluster.local::default_priority::max_connections::1024
outbound|8855||curly-headless.default.svc.cluster.local::default_priority::max_pending_requests::1024
outbound|8855||curly-headless.default.svc.cluster.local::default_priority::max_requests::1024
outbound|8855||curly-headless.default.svc.cluster.local::default_priority::max_retries::1024
outbound|8855||curly-headless.default.svc.cluster.local::high_priority::max_connections::1024
outbound|8855||curly-headless.default.svc.cluster.local::high_priority::max_pending_requests::1024
outbound|8855||curly-headless.default.svc.cluster.local::high_priority::max_requests::1024
outbound|8855||curly-headless.default.svc.cluster.local::high_priority::max_retries::3
outbound|8855||curly-headless.default.svc.cluster.local::added_via_api::true

# all listeners

[
  "0.0.0.0:15090",
  "172.17.0.17:8855",
  "10.103.154.232:15029",
  "10.103.154.232:15030",
  "10.96.0.10:53",
  "10.97.245.106:14267",
  "10.98.146.122:16686",
  "10.106.130.238:443",
  "10.103.154.232:15031",
  "10.109.154.127:15011",
  "10.97.155.171:443",
  "10.103.154.232:15443",
  "10.107.110.66:42422",
  "10.96.0.1:443",
  "10.106.130.238:15443",
  "10.96.99.165:443",
  "10.103.154.232:15032",
  "10.103.154.232:443",
  "10.103.154.232:15020",
  "10.103.154.232:31400",
  "10.97.245.106:14268",
  "0.0.0.0:20001",
  "0.0.0.0:9411",
  "0.0.0.0:8855",
  "0.0.0.0:80",
  "0.0.0.0:9901",
  "0.0.0.0:8080",
  "0.0.0.0:15014",
  "0.0.0.0:15010",
  "0.0.0.0:15004",
  "0.0.0.0:9090",
  "0.0.0.0:3000",
  "0.0.0.0:8060",
  "0.0.0.0:9091",
  "172.17.0.17:15020",
  "0.0.0.0:15001"
]

Full config dump json from POD1: https://pastebin.com/raw/TrsjmKJD

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 10
Comments: 47 (29 by maintainers)

Most upvoted comments

I am also seeing this issue. I noticed that POD1->POD2 traffic does get the Envoy access log showing the “outbound|8855||*.curly-headless.alice-dev.svc.cluster.local” cluster was matched, but POD1->POD1 traffic never does. I believe it works like this:

The iptables rules redirect the POD1->POD1 traffic to Envoy’s listening port.
Envoy accepts the connection and uses SO_ORIGINAL_DST to figure out the connection was destined for POD1 (172.17.0.17:8855 in your example)
Envoy matches the 172.17.0.17:8855 listener, not the 0.0.0.0:8855 listener.
172.17.0.17:8855 is the “inbound” listener, not the outbound listener.
Envoy applies the inbound policy requiring mTLS, but nothing applied the outbound policy (of being the mTLS client).
The connection is dropped.

For POD1->POD2, the story is different:

The iptables rules redirect the POD1->POD2 traffic to Envoy’s listening port.
Envoy accepts the connection and uses SO_ORIGINAL_DST to figure out the connection was destined for POD2 (172.17.0.6:8855 in your example)
Envoy matches the 0.0.0.0:8855 listener.
0.0.0.0:8855 is the “outbound” listener.
Envoy applies the outbound policy of being the mTLS client and sends it over the network.
The Envoy in POD2 catches the connection, maps to its inbound policy and does the server-side of the mTLS connection.
Everyone is happy.

If you get the logs from the “proxy-init” sidecar you can see the IPTables rules installed and step through them.

Workaround: My hacky temporary workaround is to “sed” /etc/hosts so that the pod’s own name maps to 127.0.0.1 instead of 172.17.0.6; this causes loopback connections to skip Envoy/Istio entirely. This makes kafka work-for-me but is bad because now I’m not going through Istio for POD1->POD1, so all the Istio policy doesn’t apply. I’m still using Istio for POD1->POD2, etc. In this case, I was using it just for mTLS, so I’m ok with not using mTLS for literally loopback connections. But it’s still not a generally good solution.

I think the root problem here is that “Pods that connect to their own IPs sometimes only go through inbound policy instead of outbound+inbound policy”

andrewjjenkins on Mar 18, 2019

@adamglt Sorry, while we discussed some options at the networking meeting the challenge is that there’s a growing emphasis on quality and this bug happens to be in the middle of some stuff that everyone is hesitant to change.

@costinm - you suggested maybe there’s a way the Sidecar resource could fix this problem. I didn’t see any but please let us know if there is a way. (That would require upgrading to 1.1 as that resource isn’t available in 1.0.4 but could be a good workaround)

@andraxylia - what should we do next to make progress on this issue? I view the current behavior pretty strongly as purely a bug in 1.0 and 1.1.

I think we need a decision on what behavior we should have in this case (“pod talks to its own IP”) (even if in the interim we only make incremental progress towards it). I can write up my thoughts as well as the alternatives at least as a discussion point if that helps.

andrewjjenkins on Apr 17, 2019

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2019-10-25. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

istio-policy-bot on Feb 7, 2020

I had the same problem with kafka in K8S when I enabled mTLS.

The leader couldn’t call himself through his DNS Name.

I fixed that locally by adding a container that adds the IP Table rule described by @andrewjjenkins

I added the snippet below in case anyone else needs it.

It would be nice to have an option in Istio that handles this case without changing Istios iptable-rule directly.

    spec:
      containers:
      - name: netshoot
         env:
         - name: POD_IP
           valueFrom:
             fieldRef:
               fieldPath: status.podIP
         image: nicolaka/netshoot
        command:
         - sh
         - -exc
         - |
           iptables -t nat -I ISTIO_OUTPUT  -d ${POD_IP}  -j RETURN && \
           sleep "100d"

christianwoehrle on May 15, 2019

I think it’s possible that #13666 addresses this by splitting inbound and outbound listeners, it looks like that is an implementation of my idea way back up here, that we talked about in Networking SIG but didn’t get clear consensus on: https://github.com/istio/istio/issues/12551#issuecomment-474087161 . That is slated to be an alpha feature in 1.2. I will try to get some time and test it and report back. Would getting a fix in 1.2 work for the various folks interested in this issue?

andrewjjenkins on Jun 14, 2019

Hi @adamglt Today, I reinstall my project, and find "0 NR filter_chain_not_found " error in envoy log. Then I try to add DestinationRule as what you showed. It works. Maybe there are some bugs before in the system. Thank you very much.

yibaoren on Feb 17, 2022

@yibaoren I’m guessing your Istio installation included a default mesh-level DestinationRule that enables mTLS:

$ kubectl get destinationrule -n istio-system default -o yaml | kubectl neat
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: default
  namespace: istio-system
spec:
  host: '*.local'
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

adamglt on Feb 10, 2022

Ack. Inbound split should not improve the connectivity ability. Split unlock the possibility but I need to read through what occurs here before I propose the next step

lambdai on Jun 20, 2019

I added this bug to the Networking WG meeting agenda here, to discuss with the broader networking team on Thursday.

andrewjjenkins on Mar 25, 2019