istio: Prod outage: Multiple services affected (upstream connect error or disconnect/reset before headers)

Hi, We had a production outage this morning which was resolved by restarting pilot.

We run two pilots (1.0.4, k8 1.10.7) in production and whilst trying to capture debug logs for you I noticed one of them was not responding to 127.0.0.1:8080/debug/adsz. I was also unable to run istioctl proxy-status.

I have captured:

  • discovery logs from both pilots from before, and after the restart
  • /configz /adsz and /endpointz from both pilots before and after
  • endpoints nodes pods and services for the clusters

I don’t want to share these things on Github so can one of you please reach out to me on slack and I will send you a tar.gz containing the files.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 22 (22 by maintainers)

Most upvoted comments

Workaround: don’t use istioctl proxy-status in 1.0.4