istio: upgrade Istio v1.7.5 --> v1.8.1 + partial STRICT mTLS: killed statefulSet headless service access

Bug description I have a Vault running in my cluster https://github.com/hashicorp/vault-helm/tree/master/templates the Vault pod exposes two ports: 8200,8201 installed as a stateful service (I can set the pod name static in advance, then each pod can access other pod via the https://podName.headlessService:port)

https-internal (port 8201) is used to coordinate the leader pod among 3 Vault pods running . public requests are coming to the public service on port 8200 BTW: istio discovery app does complain that same ports and app exposed by two services but it never stopped it from working fine.

this whole project was working fine under Istio 1.7.6 . To secure the mesh but also alow the internal pod2pod communication a semi-strict mTLS PeerAuthentication was added:

kind: PeerAuthentication
metadata:
    name: x-mtls
    namespace: x
spec:
  mtls:
    mode: STRICT
  portLevelMtls:
    "8201":
      mode: DISABLE
  selector:
    matchLabels:
      app: vault

you must exclude 8201 as Vault pod is using its own HTTPS certificates .

I’ve defined VirrualService and Gateway as required+ DestinationRule for the vault service

- apiVersion: networking.istio.io/v1beta1
  kind: DestinationRule
  metadata:
    name: x-dr
    namespace: x
  spec:
    host: '*.x.svc.cluster.local'
    trafficPolicy:
      tls:
        mode: ISTIO_MUTUAL

(the namespace x is where all the pods are)

public service (exposed via the VirtualService)

kind: Service
apiVersion: v1
metadata:
  name: x-vault
  namespace: x
spec:
  ports:
    - name: https-vaultha
      protocol: TCP
      port: 8201
      targetPort: 8201
    - name: http-vault
      protocol: TCP
      port: 8200
      targetPort: 8200
  selector:
    app: vault
  clusterIP: 172.21.119.206
  type: ClusterIP
  sessionAffinity: None
  publishNotReadyAddresses: true

headless service:

kind: Service
apiVersion: v1
metadata:
  name: x-vault-int
  namespace: x
spec:
  ports:
    - name: https-vaultha
      protocol: TCP
      port: 8201
      targetPort: 8201
    - name: http-vault
      protocol: TCP
      port: 8200
      targetPort: 8200
  selector:
    app: vault
  clusterIP: None
  type: ClusterIP
  sessionAffinity: None
  publishNotReadyAddresses: true

once istio 1.8.1 operator was deployed and Vault pod were restarted : vault inter-pod communication failed. error log is full with core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing read tcp 172.30.223.244:39576->172.30.209.131:8201: read: connection reset by peer"" showing that pod1 tried to access the leader:8201 and failed. the URL to the leader is : https://leaderHostname.vault-internal:8201

any additional logs needed ?

[ ] Docs [ ] Installation [x] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure [x] Upgrade

Expected behavior

Steps to reproduce the bug at first I thought it was the mTLS . Even after deleting the mTLS rule : it didn’t work.

Version

$ istioctl version --remote
client version: 1.8.1
control plane version: 1.8.1
data plane version: 1.7.6 (213 proxies), 1.8.1 (24 proxies)

$ kubectl version --short
Client Version: v1.20.1
Server Version: v1.17.14+IKS

How was Istio installed? IKS + managed Istio

Environment where the bug was observed (cloud vendor, OS, etc) IBM K8s Cloud (IKS) , Ubuntu 18.04 workers.

IKS internal operator configmap:

$ kubectl get cm -n ibm-operators managed-istio-custom -o yaml
...
  istio-ingressgateway-public-1-enabled: "true"
  istio-ingressgateway-public-2-enabled: "true"
  istio-ingressgateway-public-3-enabled: "true"
  istio-ingressgateway-zone-1: dal12
  istio-ingressgateway-zone-2: dal13
  istio-ingressgateway-zone-3: dal10
  istio-monitoring: "false"
  istio-pilot-traceSampling: "50"

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (5 by maintainers)

Most upvoted comments

@GregHanson found the bypass: inject these istio annoation into the POD’s spec:

traffic.sidecar.istio.io/excludeInboundPorts: '8201'
traffic.sidecar.istio.io/excludeOutboundPorts: '8201'

taitelman on Feb 1, 2021