istio: Almost every app gets UC errors, 0.012% of all requests in 24h period
I’ve noticed pretty much every application periodically gets this, a 503 with the envoy code UC
.

It’s retried as you can see, but it’s bothering me that I cannot get to the bottom of why it is happening. My destinationrule should deal with any connection timeouts etc:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: sauron-graphql-server
spec:
host: app
trafficPolicy:
connectionPool:
http:
http1MaxPendingRequests: 1024
http2MaxRequests: 10240
idleTimeout: 5m
maxRequestsPerConnection: 1024
tcp:
maxConnections: 10240
tcpKeepalive:
interval: 1m
time: 3m
loadBalancer:
simple: LEAST_CONN
outlierDetection:
baseEjectionTime: 5s
consecutiveErrors: 5
interval: 5s
maxEjectionPercent: 50
minHealthPercent: 50
tls:
mode: ISTIO_MUTUAL
The frequency of these bothers me, because across a distributed service graph, we can get spikes in latency on client requests as a result.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 94 (85 by maintainers)
Along with silentdai’s PR, we also discovered that applications which have a low socket timeout value will terminate connections to envoy more frequently, exacerbating the issue. I’ve written about it here: https://karlstoney.com/2019/05/31/istio-503s-ucs-and-tcp-fun-times
The TLDR; here is that @silentdai’s update (hopefully in 1.1.8) combined with an awareness and adjustment of applications with low tcp socket timeouts resolves 99.99% of our 503’s.
I’m going to close this issue now as it’s become a bit of a novel, I’m thankful to the whole Istio team for their efforts helping get to the bottom of this one and I know there are more improvements to come from them which should make this less of a problem.
Do you have node.js application behind envoy? Node.js has 5 sec default connection idleTimeout, and envoy has the same one. In rare time it generate this error - envoy send reqest in the moment when nodejs closing it.
We have resolve this with DestinationRule:
I’ve sent you the config dumps on slack
As an FYI; as I mentioned earlier one we had issues with clients that were doing their own connection pooling, it was getting to a point where it reached its connection pool limit and then was dying. My theory was that those clients pooling connections were holding onto connections that had otherwise been UC’d and then those connections weren’t being released.
In one of those clients we implemented https://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/impl/NoConnectionReuseStrategy.html, and we haven’t had any problems with that application since.
This was an application which has been migrated to istio, and works fine on the old infra.
I’m not sure what this tells us, but it does all feel related.