istio: upgrade Istio v1.7.5 --> v1.8.1 + partial STRICT mTLS: killed statefulSet headless service access
Bug description I have a Vault running in my cluster https://github.com/hashicorp/vault-helm/tree/master/templates the Vault pod exposes two ports: 8200,8201 installed as a stateful service (I can set the pod name static in advance, then each pod can access other pod via the https://podName.headlessService:port)
https-internal (port 8201) is used to coordinate the leader pod among 3 Vault pods running . public requests are coming to the public service on port 8200 BTW: istio discovery app does complain that same ports and app exposed by two services but it never stopped it from working fine.
this whole project was working fine under Istio 1.7.6 . To secure the mesh but also alow the internal pod2pod communication a semi-strict mTLS PeerAuthentication was added:
kind: PeerAuthentication
metadata:
name: x-mtls
namespace: x
spec:
mtls:
mode: STRICT
portLevelMtls:
"8201":
mode: DISABLE
selector:
matchLabels:
app: vault
you must exclude 8201 as Vault pod is using its own HTTPS certificates .
I’ve defined VirrualService and Gateway as required+ DestinationRule for the vault service
- apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: x-dr
namespace: x
spec:
host: '*.x.svc.cluster.local'
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
(the namespace x is where all the pods are)
public service (exposed via the VirtualService)
kind: Service
apiVersion: v1
metadata:
name: x-vault
namespace: x
spec:
ports:
- name: https-vaultha
protocol: TCP
port: 8201
targetPort: 8201
- name: http-vault
protocol: TCP
port: 8200
targetPort: 8200
selector:
app: vault
clusterIP: 172.21.119.206
type: ClusterIP
sessionAffinity: None
publishNotReadyAddresses: true
headless service:
kind: Service
apiVersion: v1
metadata:
name: x-vault-int
namespace: x
spec:
ports:
- name: https-vaultha
protocol: TCP
port: 8201
targetPort: 8201
- name: http-vault
protocol: TCP
port: 8200
targetPort: 8200
selector:
app: vault
clusterIP: None
type: ClusterIP
sessionAffinity: None
publishNotReadyAddresses: true
once istio 1.8.1 operator was deployed and Vault pod were restarted : vault inter-pod communication failed.
error log is full with
core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing read tcp 172.30.223.244:39576->172.30.209.131:8201: read: connection reset by peer""
showing that pod1 tried to access the leader:8201 and failed.
the URL to the leader is : https://leaderHostname.vault-internal:8201
any additional logs needed ?
[ ] Docs [ ] Installation [x] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure [x] Upgrade
Expected behavior
Steps to reproduce the bug at first I thought it was the mTLS . Even after deleting the mTLS rule : it didn’t work.
Version
$ istioctl version --remote
client version: 1.8.1
control plane version: 1.8.1
data plane version: 1.7.6 (213 proxies), 1.8.1 (24 proxies)
$ kubectl version --short
Client Version: v1.20.1
Server Version: v1.17.14+IKS
How was Istio installed? IKS + managed Istio
Environment where the bug was observed (cloud vendor, OS, etc) IBM K8s Cloud (IKS) , Ubuntu 18.04 workers.
IKS internal operator configmap:
$ kubectl get cm -n ibm-operators managed-istio-custom -o yaml
...
istio-ingressgateway-public-1-enabled: "true"
istio-ingressgateway-public-2-enabled: "true"
istio-ingressgateway-public-3-enabled: "true"
istio-ingressgateway-zone-1: dal12
istio-ingressgateway-zone-2: dal13
istio-ingressgateway-zone-3: dal10
istio-monitoring: "false"
istio-pilot-traceSampling: "50"
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (5 by maintainers)
@GregHanson found the bypass: inject these istio annoation into the POD’s spec: