istio: Prod outage: Multiple services affected (upstream connect error or disconnect/reset before headers)
Hi, We had a production outage this morning which was resolved by restarting pilot.
We run two pilots (1.0.4, k8 1.10.7) in production and whilst trying to capture debug logs for you I noticed one of them was not responding to 127.0.0.1:8080/debug/adsz
. I was also unable to run istioctl proxy-status
.
I have captured:
- discovery logs from both pilots from before, and after the restart
- /configz /adsz and /endpointz from both pilots before and after
- endpoints nodes pods and services for the clusters
I don’t want to share these things on Github so can one of you please reach out to me on slack and I will send you a tar.gz containing the files.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 22 (22 by maintainers)
Workaround: don’t use istioctl proxy-status in 1.0.4