istio: got error 503 - CERTIFICATE_VERIFY_FAILED - randomly after new deployments
We are running with istio 1.4.7, on EKS v1.15,
suddenly we start to get error 503 with CERTIFICATE_VERIFY_FAILED
error in the client-side, and after restarting the pilot or the citadel the problem resolved and the service come to be reachable, but when we run new deployment it reproduced again!
Verified the xDS and all look fine, the EDS look ok and all in sync, is there any chance you face this problem before? why this happened, and how can we make sure that the issue not reproduced again? cause from k8s API the deployment look ok (active healthcheck pass) but once we try to send other requests (via istio) we got error 503
Please help cause it really blocking us…
istio sidecar log:
[Envoy (Epoch 0)] [2020-08-10 07:51:42.368][34][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:95] creating a new connection
[Envoy (Epoch 0)] [2020-08-10 07:51:42.368][34][debug][client] [external/envoy/source/common/http/codec_client.cc:31] [C588988] connecting
[Envoy (Epoch 0)] [2020-08-10 07:51:42.368][34][debug][connection] [external/envoy/source/common/network/connection_impl.cc:718] [C588988] connecting to 10.28.10.212:8080
[Envoy (Epoch 0)] [2020-08-10 07:51:42.368][34][debug][connection] [external/envoy/source/common/network/connection_impl.cc:727] [C588988] connection in progress
[Envoy (Epoch 0)] [2020-08-10 07:51:42.368][34][debug][pool] [external/envoy/source/common/http/conn_pool_base.cc:20] queueing request due to no available connections
[Envoy (Epoch 0)] [2020-08-10 07:51:42.368][34][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:742] [C6] stream closed: 0
[Envoy (Epoch 0)] [2020-08-10 07:51:42.368][34][debug][connection] [external/envoy/source/common/network/connection_impl.cc:566] [C588988] connected
[Envoy (Epoch 0)] [2020-08-10 07:51:42.368][34][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:191] [C588988] handshake expecting read
[Envoy (Epoch 0)] [2020-08-10 07:51:42.370][34][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:198] [C588988] handshake error: 1
[Envoy (Epoch 0)] [2020-08-10 07:51:42.370][34][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:226] [C588988] TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[Envoy (Epoch 0)] [2020-08-10 07:51:42.370][34][debug][connection] [external/envoy/source/common/network/connection_impl.cc:193] [C588988] closing socket: 0
[Envoy (Epoch 0)] [2020-08-10 07:51:42.370][34][debug][client] [external/envoy/source/common/http/codec_client.cc:88] [C588988] disconnect. resetting 0 pending requests
[Envoy (Epoch 0)] [2020-08-10 07:51:42.370][34][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:136] [C588988] client disconnected, failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[Envoy (Epoch 0)] [2020-08-10 07:51:42.370][34][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:167] [C588988] purge pending, failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[Envoy (Epoch 0)] [2020-08-10 07:51:42.370][34][debug][router] [external/envoy/source/common/router/router.cc:911] [C588982][S7458161559789993786] upstream reset: reset reason connection failure
[Envoy (Epoch 0)] [2020-08-10 07:51:42.370][34][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1354] [C588982][S7458161559789993786] Sending local reply with details upstream_reset_before_response_started{connection failure,TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED}
[Envoy (Epoch 0)] [2020-08-10 07:51:42.370][34][debug][filter] [src/envoy/http/mixer/filter.cc:135] Called Mixer::Filter : encodeHeaders 2
[Envoy (Epoch 0)] [2020-08-10 07:51:42.370][34][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1552] [C588982][S7458161559789993786] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '91'
'content-type', 'text/plain'
'date', 'Mon, 10 Aug 2020 07:51:41 GMT'
'server', 'istio-envoy'
pilot logs look ok, nothing special there
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (5 by maintainers)
We found the main reason for why this happened, This is the scenario to reproduce it: pre: create service and connect it to a pod 1: take a POD with service account: SA_1 2: redeploy it again but with different service account: SA_2
When we change the service account(SA_2) without changing the service name itself, the secure naming does not been pushed to the envoys although it has been changed,
in the cluster discovery service (CDS) we saw that the spiffy value ( secure naming) still with the old value (SA_1) and this cause to handshake error and got error 503 The interesting thing is if we deployed the pods again with (SA_3) service account, it got the second service account spiffe value (SA_2), this means that the pilot already have the updated data, but not push them to the envoys as expected and keep them with wrong values
The best workaround we found is to create new service and this triggers the pilot to push the xDS tables to all the envoys with the correct values