istio: Service without selector does not work with mTLS when switching the endpoint

Bug description

When using services without selectors with mTLS, we get upstream connect error or disconnect/reset before headers. reset reason: connection failure if switching the endpoint.
This is an trick of Knative Serving porject. It changes the svc’s endpoint for the routing. But due to this bug, Knative Serving does not work with mTLS STRICT mode.

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure [ ] Docs [ ] Installation [x] Networking [ ] Performance and Scalability [ ] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure

Expected behavior

When switching the endpoint of services without selectors, mTLS should also work fine.

Steps to reproduce the bug

Create a testing ns bug

$ kubectl create ns bug

2. Add mTLS policy & destinationrule

cat <<EOF | kubectl apply -f -
apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
  name: "default"
  namespace: "bug"
spec:
  peers:
  - mtls:
      mode: STRICT
---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "mtls-services"
  namespace: "bug"
spec:
  host: "*.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
EOF

3. Create sleep deployment for the test client

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
  namespace: bug
---
apiVersion: v1
kind: Service
metadata:
  name: sleep
  namespace: bug
  labels:
    app: sleep
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleep
  namespace: bug
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  template:
    metadata:
      labels:
        app: sleep
      annotations:
        sidecar.istio.io/inject: "true"
    spec:
      serviceAccountName: sleep
      containers:
      - name: sleep
        image: governmentpaas/curl-ssl
        command: ["/bin/sleep", "3650d"]
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - mountPath: /etc/sleep/tls
          name: secret-volume
      volumes:
      - name: secret-volume
        secret:
          secretName: sleep-secret
          optional: true
EOF

4. Create httpbin1 & httpbin2 deployments for test server

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: httpbin
  namespace: bug
  labels:
    app: httpbin
spec:
  ports:
  - name: http
    port: 8000
    targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin1
  namespace: bug
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpbin1
      version: v1
  template:
    metadata:
      labels:
        app: httpbin1
        version: v1
      annotations:
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - image: docker.io/kennethreitz/httpbin
        imagePullPolicy: IfNotPresent
        name: httpbin
        ports:
        - containerPort: 80

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin2
  namespace: bug
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpbin2
      version: v1
  template:
    metadata:
      labels:
        app: httpbin2
        version: v1
      annotations:
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - image: docker.io/kennethreitz/httpbin
        imagePullPolicy: IfNotPresent
        name: httpbin
        ports:
        - containerPort: 80
EOF

Checking the pod IP and pod name

$ kubectl get pod -n bug -o wide
NAME                        READY   STATUS    RESTARTS   AGE     IP              NODE                                               NOMINATED NODE   READINESS GATES
httpbin1-6fd7cccfd7-2tmmc   2/2     Running   0          4m15s   172.20.89.153   ip-172-20-88-161.ap-southeast-1.compute.internal   <none>           <none>
httpbin2-56f9bd7876-7q467   2/2     Running   0          4m15s   172.20.54.190   ip-172-20-49-92.ap-southeast-1.compute.internal    <none>           <none>
sleep-67769569f9-knlsq      2/2     Running   0          7m1s    172.20.68.201   ip-172-20-64-95.ap-southeast-1.compute.internal    <none>           <none>

6. Create endpoint for httpbin1 (You need to replace the pod IP&name with yours)

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Endpoints
metadata:
  name: httpbin
  namespace: bug
subsets:
  - addresses:
    - ip: 172.20.89.153 ### Replace your httpbin1 pod's IP
      targetRef:
        kind: Pod
        name: httpbin1-6fd7cccfd7-2tmmc ### Replace your httpbin1 pod's name
        namespace: bug
    ports:
    - name: http
      port: 80
      protocol: TCP
EOF

Testing the serving access (It works fine.)

$ kubectl -n bug exec -it sleep-67769569f9-knlsq -- curl httpbin.bug.svc:8000

Switching the endpoint to httpbin2

$ kubectl edit ep -n bug httpbin
  ...
subsets:
- addresses:
  - ip: 172.20.54.190 ### Replace your httpbin2 pod's IP
    targetRef:
      kind: Pod
      name: httpbin2-56f9bd7876-7q467 ### Replace your httpbin2 pod's name
      namespace: bug

Now, you get upstream connect error or disconnect/reset before headers. reset reason: connection failure error.

$ kubectl -n bug exec -it sleep-67769569f9-knlsq --   curl httpbin.bug.svc:8000
upstream connect error or disconnect/reset before headers. reset reason: connection failure

Version (include the output of istioctl version --remote and kubectl version)

Istio 1.3.2 / k8s 1.14

command output

$ istioctl version --remote
client version: 1.2.4
cluster-local-gateway version: 
cluster-local-gateway version: 
citadel version: 1.3.2
galley version: 1.3.2
ingressgateway version: 1.3.2
ingressgateway version: 1.3.2
pilot version: 1.3.2
pilot version: 1.3.2
policy version: 1.3.2
sidecar-injector version: 1.3.2
telemetry version: 1.3.2

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.0-alpha.0.1451+896c901684e774", GitCommit:"896c901684e774169dbd477aecd880df1be6bdd0", GitTreeState:"clean", BuildDate:"2019-06-25T04:30:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed?

Here is the template https://github.com/knative/serving/blob/75f0f8775f99357d6a3d23bf478cc87c4d577987/third_party/istio-1.3.2/istio.yaml

Environment where bug was observed (cloud vendor, OS, etc)

Kubernetes on AWS, but I’m sure it does not matter.

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 67 (38 by maintainers)

Most upvoted comments

@nak3 yep. I figured that was the case, but I thought I would ask since that would be much easier to support 🙂

howardjohn on Oct 24, 2019

Originally it was due to lack of status on the Istio resources, i.e. we were reprogramming virtual services during scale to 0 to point at Activator (buffering proxy) and we never knew when it would actually be done so we had random timeouts throughout the code. But afterwards it turned out to be an extremely useful tool for other things like backend overload protection, ideal loadbalancing (e.g. when you want single request in flight to each pod), etc.

vagababov on Oct 23, 2019