istio: upstream connect error or disconnect/reset before headers. reset reason: connection termination
Bug description
When I deploy a service after some point I am getting
upstream connect error or disconnect/reset before headers. reset reason: connection termination error. It is random and i have found out that it only happens (not all the time) but if i got error it is definitely after this line connection is no more.
kubectl logs -n namespace $POD -c istio-proxy
2020-09-24T17:12:35.507004Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 13,
Also after getting this error and can’t connect to my Istio endpoint/service, 30 minutes later (roughly) i get this line
2020-09-24T17:43:49.473767Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 13,
connection is reinstated and i can connect the service again. It does not happen all the time but when it happens it lasts 30 min exact time to be connection reinstated.
I made a curl test every 30 seconds and it starts at 2020-09-24-17-12 and finishes at 2020-09-24-17-43
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-11
tracing-rest-checker-pg8hp tracing-rest-checker {"data":{"names":["proba"],"ndarray":[[0.39731466202150834]]},"meta":{}}
tracing-rest-checker-pg8hp tracing-rest-checker
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-12
tracing-rest-checker-pg8hp tracing-rest-checker {"data":{"names":["proba"],"ndarray":[[0.39731466202150834]]},"meta":{}}
tracing-rest-checker-pg8hp tracing-rest-checker
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-12
tracing-rest-checker-pg8hp tracing-rest-checker upstream connect error or disconnect/reset before headers. reset reason: connection termination
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-13
tracing-rest-checker-pg8hp tracing-rest-checker upstream connect error or disconnect/reset before headers. reset reason: connection termination
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
...
...
...
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-42
tracing-rest-checker-pg8hp tracing-rest-checker upstream connect error or disconnect/reset before headers. reset reason: connection termination
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-43
tracing-rest-checker-pg8hp tracing-rest-checker upstream connect error or disconnect/reset before headers. reset reason: connection termination
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-43
tracing-rest-checker-pg8hp tracing-rest-checker upstream connect error or disconnect/reset before headers. reset reason: connection termination
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-44
tracing-rest-checker-pg8hp tracing-rest-checker {"data":{"names":["proba"],"ndarray":[[0.39731466202150834]]},"meta":{}}
tracing-rest-checker-pg8hp tracing-rest-checker
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-44
tracing-rest-checker-pg8hp tracing-rest-checker {"data":{"names":["proba"],"ndarray":[[0.39731466202150834]]},"meta":{}}
tracing-rest-checker-pg8hp tracing-rest-checker
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
Based on the suggestion I also got the before and after proxy config dump with
while true ; do kubectl exec -n istio-system $(kubectl get pods -n istio-system -l app=istiod -o name) -- curl 'localhost:8080/debug/config_dump?proxyID=tracing-and-logging-rest-tracing-0-model1-865b5596fd-jl2vp.emtech' > after/after_config_dump.txt-$(date -d '1 hour ago' +%Y-%m-%d-%H-%M); sleep 300; done
Details logs are attached
[ ] Docs [ ] Installation [ X] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [ ] Security [ ] Test and Release [ X] User Experience [ ] Developer Infrastructure
Expected behavior To be able to connect endpoint all the time.
Steps to reproduce the bug
Version (include the output of istioctl version --remote
and kubectl version --short
and helm version
if you used Helm)
$ istioctl version --remote
client version: 1.6.8
control plane version: 1.6.8
data plane version: 1.6.8 (7 proxies)
$ kubectl version --short
Client Version: v1.18.6
Server Version: v1.16.13
How was Istio installed? istioctl install
Environment where bug was observed (cloud vendor, OS, etc) I have seen this bug on
AWS EKS 1.17 Kind 1.17 Cluster (onprem) Azure AKS 1.16
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 17
- Comments: 39 (5 by maintainers)
I just restarted my computer and it fixed it
We have faced this issue too. I’ve found the root cause in our case. Hope this will helps you guys. TL;DR The root cause in our case is time of request exceeds the socket timeout of the HTTP server. Specifically here is Nodejs HTTP server. Increasing the socket timeout of Nodejs HTTP server will solve. You could find the similar setting on your HTTP server. Or use event base or something instead of HTTP request.
Context: We have two services
service-a
andservice-b
. They are Nestjs projects, running on Node 12.service-a
requests HTTP toservice-b
.service-a
got 503 with errorupstream connect error or disconnect/reset before headers. reset reason: connection termination
. Butservice-b
stills processed. it happens randomly. After tracing the logs, we saw that the error occurs when the request fromservice-a
toservice-b
took too long. I tried to config Istio timeout but it didn’t work.I created 2 new sample services in local without Istio. Set
service-b
to response slowly. And I gotError: socket hang up code ECONNRESET
after 2 minutes. That makes I think the root cause is the HTTP server, not Istio. Focus on the error from Nodejs, I found out that Node HTTP server will disconnects the socket connection after amount of time. And that amount of time can be set via the option timeout I increase the timeout value ofservice-b
and it works. You could find the similar setting on your HTTP server.REMEMBER to increase the socket timeout of
service-a
if the flow is like the image below, because the HTTP request toservice-a
can be terminated too.Conclusion:
@howardjohn @ramaraochavali Can you please re-open this issue?
It is now 2 years and many people face this issue in production. Can you please prioritize it, as this has bad implications on connections.
There were many reports and if further info needed, I am sure we all can provide infos.
Thanks!
Hey folks - understood the concerns when seeing errors like this. However, this issue is not going to be a helpful path forward. This is a very generic error message that basically means “the request did not work”. Without specific context (in the form of
istioctl bug-report
, logs, configs, etc), its impossible to make forward progress.So I am not trying to dismiss this entirely - but please open new issues with details about your case so we can help.
Also see https://istio.io/latest/blog/2021/upcoming-networking-changes/ since that is a fairly basic one that could trigger this.
Thanks!
@puneetloya @a8j8i8t8 this shall be the right issue, as I saw you observed this too