istio: Ingress SDS missing event to push key/cert pair to proxy
Bug description In rare cases we’re seeing that SDS is missing an event entirely that’s causing only the root cert to get pushed but not the key/cert pair. Looking through the logs we can see that the root cert gets pushed out once the secret is eventually found:
2019-11-12T22:05:53.335991Z warn secretFetcherLog Cannot find secret 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b-cacert, searching for fallback secret gateway-fallback
2019-11-12T22:05:53.336191Z error secretFetcherLog cannot find secret 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b-cacert and cannot find fallback secret gateway-fallback
2019-11-12T22:05:53.336244Z warn cacheLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-112, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b-cacert, EVENT: SecretFetcher cannot find secret 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b-cacert from cache
2019-11-12T22:05:53.336308Z warn sdsServiceLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-112, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b-cacert, EVENT: waiting for ingress gateway secret for proxy "router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local"
2019-11-12T22:05:53.336165Z warn secretFetcherLog Cannot find secret 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b, searching for fallback secret gateway-fallback
2019-11-12T22:05:53.336362Z error secretFetcherLog cannot find secret 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b and cannot find fallback secret gateway-fallback
2019-11-12T22:05:53.336382Z warn cacheLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-113, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b, EVENT: SecretFetcher cannot find secret 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b from cache
2019-11-12T22:05:53.336391Z warn sdsServiceLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-113, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b, EVENT: waiting for ingress gateway secret for proxy "router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local"
2019-11-12T22:05:59.991770Z error sdsServiceLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-111, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b, EVENT: NotifyProxy failed. No connection with id "router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-111" can be found
2019-11-12T22:05:59.991797Z error cacheLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-111, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b, EVENT: failed to notify secret change for proxy: no connection with id "router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-111" can be found
2019-11-12T22:05:59.991894Z info sdsServiceLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-112, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b-cacert, EVENT: pushed root cert to proxy
You’ll see that no key/cert pair gets pushed. This causes TLS/mTLS gateways to break completely.
The way to recover from this is to delete the secret and add it back again. This results in the following logs:
2019-11-12T23:09:37.570981Z info secretFetcherLog secret 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b is deleted
2019-11-12T23:09:37.571063Z error sdsServiceLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-111, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b, EVENT: NotifyProxy failed. No connection with id "router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-111" can be found
2019-11-12T23:09:37.571075Z error cacheLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-111, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b, EVENT: failed to notify secret change for proxy: no connection with id "router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-111" can be found
2019-11-12T23:09:37.571085Z info secretFetcherLog secret 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b-cacert is deleted
2019-11-12T23:09:37.571104Z error sdsServiceLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-110, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b-cacert, EVENT: NotifyProxy failed. No connection with id "router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-110" can be found
2019-11-12T23:09:37.571115Z error cacheLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-110, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b-cacert, EVENT: failed to notify secret change for proxy: no connection with id "router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-110" can be found
2019-11-12T23:10:14.200253Z info sdsServiceLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-112, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b-cacert, EVENT: pushed root cert to proxy
2019-11-12T23:10:14.200251Z info sdsServiceLog CONNECTION ID: router~192.168.54.237~istio-ingressgateway-66b78f978b-njr66.istio-system~istio-system.svc.cluster.local-113, RESOURCE NAME: 975cfa2b-9e3f-4c4b-8eb6-2c810aad271b, EVENT: pushed key/cert pair to proxy
Expected behavior SDS always pushing the required crypto to the proxy.
Steps to reproduce the bug Unknown, I’ve found this very hard to reproduce. It has only affected one user of ours AFAIK.
Version (include the output of istioctl version --remote
and kubectl version
)
client version: unknown
control plane version: 1.3.0
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T23:41:55Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-eks-5876d6", GitCommit:"5876d6b7429820450950ade17fe7b4bf5ccada7f", GitTreeState:"clean", BuildDate:"2019-09-24T20:54:25Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
How was Istio installed?
Imagine having a Makefile that does this where $ISTIO_VERSION
is 1.3.0
:
@curl -s -L https://git.io/getLatestIstio | ISTIO_VERSION=$(ISTIO_VERSION) sh -
@cat istio/istio-namespace.yaml > istio/istio.yaml
@cat istio/istio-namespace.yaml > istio/istio-crds.yaml
@helm template istio-$(ISTIO_VERSION)/install/kubernetes/helm/istio-init --name istio-init --namespace istio-system >> istio/istio-crds.yaml
@helm template istio-$(ISTIO_VERSION)/install/kubernetes/helm/istio --name istio --namespace istio-system \
--set sidecarInjectorWebhook.rewriteAppHTTPProbe=true \
--set servicegraph.enabled=true \
--set gateways.istio-ingressgateway.sds.enabled=true \
--set tracing.enabled=true \
--set kiali.enabled=true \
--set "kiali.dashboard.jaegerURL=http://jaeger-query:16686" \
--set "kiali.dashboard.grafanaURL=http://grafana:3000" \
--set grafana.enabled=true >> istio/istio.yaml
@cat istio-$(ISTIO_VERSION)/install/kubernetes/helm/istio-init/files/crd-1* > istio/istio-delete-crds.yaml
@rm -rf istio-$(ISTIO_VERSION)
A kubectl apply is later done on the istio-crds.yaml
and istio.yaml
files.
Environment where bug was observed (cloud vendor, OS, etc) EKS
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 23 (17 by maintainers)
I still see this issue with 1.3.5. Can we reopen this? @incfly I don’t have perms to reopen the issue. Here’s how I’m able to repro:
Create a gateway that uses a secret like so:
Create the secret that the gateway uses:
Now delete the secret and gateway and create them again using the exact same values. This causes SDS to fail to push things over to Envoy. In 1.3.5 I see that even the root cert doesn’t make it to Envoy so the TLS endpoint is completely broken:
It will be shipped with release-1.7. Which will be announced on August 11th (https://github.com/istio/istio/wiki/Istio-Release-1.7) @williamaronli could you cherrypick this fix into 1.5 and 1.6 as well?
https://github.com/istio/istio/pull/24817 is the fix for the issue.
@aaronjwood
I vaguely recall this being a known bug that has been fixed in one of the 1.3 patch releases. I highly recommend you upgrade to 1.3.5 which has a number of high severity fixed.
@JimmyCYJ can verify if this was fixed