istio: 1.6 istio-proxy SDS not updating certificates SSLV3_ALERT_CERTIFICATE_EXPIRED

Bug description In our 1.6 qualifying environment we have an application that is rejecting requests with upstream connect error or disconnect/reset before headers. reset reason: connection failure.

We have three (almost) identical test apps:

  • istio-test-app-1
  • istio-test-app-2
  • istio-test-app-3 (this is the broken one)

We continually send load that goes 1 -> 2 -> 3.

We rolling restart istio-test-app-1 every 10 minutes to create config churn

Screenshot 2020-10-17 at 16 09 18

The pod is 25 hours old, however as you can see here something happened at 14:35 which broke istio-test-app-3 (1 and 2 remain fine).

The istio proxy logs contain lots of sds noise:

{"level":"info","time":"2020-10-17T14:39:40.757245Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T14:39:40.757366Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T14:39:40.757625Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T14:39:40.757772Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T14:39:40.875744Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T14:39:40.875865Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T14:39:40.876142Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T14:39:40.876315Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T14:44:40.680854Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
{"level":"info","time":"2020-10-17T14:44:40.680939Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
2020-10-17T14:44:40.681104Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-44"
2020-10-17T14:44:40.681167Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-45"
{"level":"info","time":"2020-10-17T14:44:40.989664Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T14:44:40.989763Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T14:44:40.989996Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T14:44:40.990130Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T14:44:41.101797Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T14:44:41.101917Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T14:44:41.102142Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T14:44:41.102384Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T14:49:40.680826Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
{"level":"info","time":"2020-10-17T14:49:40.680922Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
2020-10-17T14:49:40.681068Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-46"
2020-10-17T14:49:40.681109Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-47"
{"level":"info","time":"2020-10-17T14:49:40.853145Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T14:49:40.853301Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T14:49:40.853587Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T14:49:40.853768Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T14:49:40.917526Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T14:49:40.917640Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T14:49:40.917876Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T14:49:40.918032Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T14:54:40.680797Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
{"level":"info","time":"2020-10-17T14:54:40.680866Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
2020-10-17T14:54:40.680941Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-48"
2020-10-17T14:54:40.681040Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-49"
{"level":"info","time":"2020-10-17T14:54:40.758052Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T14:54:40.758157Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T14:54:40.758426Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T14:54:40.758584Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T14:54:41.127609Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T14:54:41.127721Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T14:54:41.127964Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T14:54:41.128115Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T14:59:40.680900Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
{"level":"info","time":"2020-10-17T14:59:40.680941Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
2020-10-17T14:59:40.681091Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-50"
2020-10-17T14:59:40.681147Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-51"
{"level":"info","time":"2020-10-17T14:59:40.795179Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T14:59:40.795280Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T14:59:40.795539Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T14:59:40.795689Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T14:59:41.114830Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T14:59:41.114952Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T14:59:41.115164Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T14:59:41.115318Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T15:04:40.680881Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
{"level":"info","time":"2020-10-17T15:04:40.680970Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
2020-10-17T15:04:40.681071Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-52"
2020-10-17T15:04:40.681126Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-53"
{"level":"info","time":"2020-10-17T15:04:40.980455Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T15:04:40.980561Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T15:04:40.980783Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T15:04:40.980906Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T15:04:41.180954Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T15:04:41.181074Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T15:04:41.181312Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T15:04:41.181612Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T15:09:40.680961Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
2020-10-17T15:09:40.681152Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-54"
2020-10-17T15:09:40.681253Z	warning	envoy config	[bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 2, resource:default Close connection to proxy "sidecar~10.206.4.33~istio-test-app-3-69d695bf8b-b256m.istio-test-app-3~istio-test-app-3.svc.cluster.local-55"
{"level":"info","time":"2020-10-17T15:09:40.681058Z","scope":"sds","msg":"resource:default connection is terminated: rpc error: code = Canceled desc = context canceled"}
{"level":"info","time":"2020-10-17T15:09:40.842526Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T15:09:40.842666Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T15:09:40.842911Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T15:09:40.843082Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}
{"level":"info","time":"2020-10-17T15:09:40.893759Z","scope":"sds","msg":"resource:default new connection"}
{"level":"info","time":"2020-10-17T15:09:40.893894Z","scope":"sds","msg":"Skipping waiting for ingress gateway secret"}
{"level":"info","time":"2020-10-17T15:09:40.894106Z","scope":"cache","msg":"GenerateSecret from file default"}
{"level":"info","time":"2020-10-17T15:09:40.894256Z","scope":"sds","msg":"resource:default pushed key/cert pair to proxy"}

The only difference that istio-test-app-3 has vs 1 and 2 is that it uses:

    - name: OUTPUT_CERTS
      value: /etc/certs

As per this documentation.

To write certs to this volumeMount:

  - emptyDir:
      medium: Memory
    name: istio-certs

In order to write the certs to disk (as I was experimenting with for prometheus)

Looking at the logs I see:

2020-10-17T15:15:40.611175Z     debug   envoy conn_handler      [external/envoy/source/server/connection_handler_impl.cc:411] [C59921] new connection
2020-10-17T15:15:40.612284Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:191] [C59921] handshake expecting read
2020-10-17T15:15:40.612317Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:191] [C59921] handshake expecting read
2020-10-17T15:15:40.612704Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:198] [C59921] handshake error: 1
2020-10-17T15:15:40.612717Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:226] [C59921] TLS error: 268436501:SSL routines:OPENSSL_internal:SSLV3_ALERT_CERTIFICATE_EXPIRED
2020-10-17T15:15:40.612723Z     debug   envoy connection        [external/envoy/source/common/network/connection_impl.cc:200] [C59921] closing socket: 0
2020-10-17T15:15:40.612752Z     debug   envoy conn_handler      [external/envoy/source/server/connection_handler_impl.cc:111] [C59921] adding to cleanup list
2020-10-17T15:15:40.679479Z     debug   envoy filter    [external/envoy/source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
2020-10-17T15:15:40.679512Z     debug   envoy filter    [external/envoy/source/extensions/filters/listener/tls_inspector/tls_inspector.cc:78] tls inspector: new connection accepted
2020-10-17T15:15:40.679541Z     debug   envoy filter    [external/envoy/source/extensions/filters/listener/tls_inspector/tls_inspector.cc:148] tls:onServerName(), requestedServerName: outbound_.80_._.app.istio-test-app-3.svc.cluster.local
2020-10-17T15:15:40.679607Z     debug   envoy conn_handler      [external/envoy/source/server/connection_handler_impl.cc:411] [C59922] new connection
2020-10-17T15:15:40.680782Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:191] [C59922] handshake expecting read
2020-10-17T15:15:40.680811Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:191] [C59922] handshake expecting read
2020-10-17T15:15:40.681162Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:198] [C59922] handshake error: 1
2020-10-17T15:15:40.681181Z     debug   envoy connection        [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:226] [C59922] TLS error: 268436501:SSL routines:OPENSSL_internal:SSLV3_ALERT_CERTIFICATE_EXPIRED

Which makes me check:

❯ ./src/1.6.12/bin/istioctl pc secret istio-test-app-3-69d695bf8b-b256m.istio-test-app-3
RESOURCE NAME     TYPE           STATUS     VALID CERT     SERIAL NUMBER                               NOT AFTER                NOT BEFORE
default           Cert Chain     ACTIVE     true           331844299171782252960640586806792969930     2020-10-17T13:34:41Z     2020-10-16T13:34:41Z
default           Cert Chain     ACTIVE     true           331844299171782252960640586806792969930     2020-10-17T13:34:41Z     2020-10-16T13:34:41Z
ROOTCA            CA             ACTIVE     true           2623683368870835277565802163895395946       2030-10-07T07:55:53Z     2020-10-09T07:55:53Z

Which shows the certificates are indeed out of date - and expired exactly when we started seeing issues.

This made me check another app where we mount certificates in this way (prometheus) and sure enough I see its certs are out of date too:

❯ ./src/1.6.12/bin/istioctl pc secret prometheus-0.app-metrics
RESOURCE NAME     TYPE           STATUS     VALID CERT     SERIAL NUMBER                              NOT AFTER                NOT BEFORE
default           Cert Chain     ACTIVE     true           88593072017091452019065727082748990393     2020-10-16T18:43:59Z     2020-10-15T18:43:59Z
default           Cert Chain     ACTIVE     true           88593072017091452019065727082748990393     2020-10-16T18:43:59Z     2020-10-15T18:43:59Z
ROOTCA            CA             ACTIVE     true           2623683368870835277565802163895395946      2030-10-07T07:55:53Z     2020-10-09T07:55:53Z

Restarting the pod renewed the certificates:

❯ ./src/1.6.12/bin/istioctl pc secret istio-test-app-3-69d695bf8b-87x6j.istio-test-app-3
RESOURCE NAME     TYPE           STATUS     VALID CERT     SERIAL NUMBER                               NOT AFTER                NOT BEFORE
default           Cert Chain     ACTIVE     true           168785524709398884220511848353340100457     2020-10-18T15:21:26Z     2020-10-17T15:21:26Z
default           Cert Chain     ACTIVE     true           168785524709398884220511848353340100457     2020-10-18T15:21:26Z     2020-10-17T15:21:26Z
ROOTCA            CA             ACTIVE     true           2623683368870835277565802163895395946       2030-10-07T07:55:53Z     2020-10-09T07:55:53Z

[ ] Docs [ ] Installation [ ] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [x] Security [ ] Test and Release [x] User Experience [ ] Developer Infrastructure

Expected behavior SDS certs to be updated correctly when using OUTPUT_CERTS

Steps to reproduce the bug No idea

Version (include the output of istioctl version --remote and kubectl version --short and helm version if you used Helm) 1.6.12

How was Istio installed? Helm

Environment where bug was observed (cloud vendor, OS, etc) GKE

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

Definitely not! If we can’t do sequential version updates (1.5 -> 1.6 -> 1.7) which is the approved upgrade route maintaining backwards compatibility, I have no confidence in an untrodden and untested path of 1.5 -> 1.7 working.

We decided to stay at least one minor version behind because of the painful track record of issues and breaking changes with new releases. After the nightmare experience of getting onto 1.5 (telemetry v2 etc), we decided to leave it a little longer for 1.6 to stabilise.

I really believe Istio are going to alienate any serious users with this overly rapid release process and T-1 support policy. There needs to be a focus on stability where you are right now and ensuring people are actually able to upgrade, not releasing more enhancements. 12 patch versions into 1.6 and I wouldn’t describe it as anything other than broken and for us, actually unusable:

Two weeks of testing it and we we’ve not been able to get it past our most basic qualifying environments:

I’ve always been a massive advocate of Istio. But I’m finding it harder and harder to continue to recommend it.

1.5 -> 1.6 has easily been the worst experience yet.

@Stono you can do it with:

      annotations:
        proxy.istio.io/config: |
          proxyMetadata:
            OUTPUT_CERTS: /tmp/certs

I think we need to update some of the mesh config docs for this annotation though, it was new