istio: Proxy cannot get SDS push if SDS server is not ready and proxy sends out SDS requests already.

Bug description SDS agent prints a log when it starts the SDS server, but we need a log indicating when the SDS server is ready.

If SDS server is not ready, and Envoy sends SDS requests, then no secrets are pushed to Envoy. We can reproduce this by adding a delay before starting SDS server here.

If the local SDS integration test is ready, we can add more tests to cover negative cases and corner cases on SDS.

We may need a debug endpoint at SDS server, and make readiness probe to the endpoint before starting Envoy.

Expected behavior Envoy gets key/cert via SDS

Steps to reproduce the bug Add a delay before starting SDS server here, check Envoy log/config and traffic.

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm) 1.5

How was Istio installed?

Environment where bug was observed (cloud vendor, OS, etc)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 5
  • Comments: 54 (45 by maintainers)

Most upvoted comments

Nice to see this is being resolved. We were having weird 503 and noticed that secrets in the sidecars are not initialized. Is this going to be in release-1.5?

1.5.2 + envoy from 1.6 is broken: gcr.io/howardjohn-istio/proxyv2:mismatch 1.6 + envoy from 1.5 is working: gcr.io/howardjohn-istio/proxyv2:mismatch-16-agent

conclusion: its purely on pilot-agent that we have broken this

on some versions just the ROOTCA is broken:

$ cat /tmp/root-missing | rg -i Secret
[Envoy (Epoch 0)] [2020-04-23 22:36:43.287][23][info][config] [external/envoy/source/server/configuration_impl.cc:62] loading 0 static secret(s)
[Envoy (Epoch 0)] [2020-04-23 22:36:43.323][23][debug][config] [external/envoy/source/common/config/grpc_mux_impl.cc:83] gRPC mux addWatch for type.googleapis.com/envoy.api.v2.auth.Secret
[Envoy (Epoch 0)] [2020-04-23 22:36:43.323][23][debug][config] [external/envoy/source/common/config/grpc_mux_impl.cc:40] No stream available to sendDiscoveryRequest for type.googleapis.com/envoy.api.v2.auth.Secret
[Envoy (Epoch 0)] [2020-04-23 22:36:43.323][23][debug][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:47] Establishing new gRPC bidi stream for rpc StreamSecrets(stream .envoy.api.v2.DiscoveryRequest) returns (stream .envoy.api.v2.DiscoveryResponse);
2020-04-23T22:36:43.514833Z     info    cache   GenerateSecret default
[Envoy (Epoch 0)] [2020-04-23 22:36:43.515][23][debug][config] [external/envoy/source/common/config/grpc_mux_impl.cc:137] Received gRPC message for type.googleapis.com/envoy.api.v2.auth.Secret at version 04-23 22:36:43.325
[Envoy (Epoch 0)] [2020-04-23 22:36:43.515][23][debug][config] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:631] Secret is updated.
[Envoy (Epoch 0)] [2020-04-23 22:36:43.518][23][debug][config] [external/envoy/source/common/config/grpc_subscription_impl.cc:67] gRPC config for type.googleapis.com/envoy.api.v2.auth.Secret accepted with 1 resources with version 04-23 22:36:43.325

full log: root-missing.txt

how to workaround this? I met this 503 TLS error: Secret is not supplied by SDS

$ kubectl get pod -n istio-system
NAME                                    READY   STATUS    RESTARTS   AGE
istio-ingressgateway-768db54bdb-v6598   1/1     Running   0          6h53m
istio-tracing-d497b9c9b-qmnqr           1/1     Running   0          6h59m
istiod-94755f59d-7x4qp                  1/1     Running   0          5h41m
kiali-6bc5c6f578-wvmdx                  1/1     Running   0          6h59m
prometheus-7d99cfd4fb-8h6xt             1/2     Running   0          6h59m
$istioctl pc s istio-ingressgateway-768db54bdb-v6598.istio-system
RESOURCE NAME       TYPE           STATUS      VALID CERT     SERIAL NUMBER                     NOT AFTER                NOT BEFORE
default                            WARMING     false
ROOTCA                             WARMING     false
https-certs-pre     Cert Chain     ACTIVE      true           34544361368060865268342148204     2020-12-26T12:11:01Z     2019-12-26T12:11:01Z

I found some logs in istio-ingressgateway pod:

[Envoy (Epoch 0)] [2020-04-23 02:56:07.262][40][debug][init] [external/envoy/source/common/init/manager_impl.cc:20] added target SdsApi default to init manager Cluster outbound|443||kubernetes.default.svc.cluster.local
[Envoy (Epoch 0)] [2020-04-23 02:56:07.262][40][debug][init] [external/envoy/source/common/init/manager_impl.cc:20] added target SdsApi ROOTCA to init manager Cluster outbound|443||kubernetes.default.svc.cluster.local

Here is a new one… the default secret is fetched but ROOTCA is not:

2020-04-22T14:37:09.858051Z     info    sds     resource:default new connection
2020-04-22T14:37:10.581549Z     info    cache   Root cert has changed, start rotating root cert for SDS clients
2020-04-22T14:37:10.581611Z     info    cache   GenerateSecret default
2020-04-22T14:37:10.581770Z     info    sds     resource:default pushed key/cert pair to proxy

Running 1.6-alpha.81322fa1cba9fe98047bfcc275b0adeb82465fdd