ingress-nginx: Request hangs when upstream service loses all endpoints

NGINX Ingress controller version: 0.31.1

Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“17”, GitVersion:“v1.17.2”, GitCommit:“59603c6e503c87169aea6106f57b9f242f64df89”, GitTreeState:“clean”, BuildDate:“2020-01-18T23:30:10Z”, GoVersion:“go1.13.5”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“14+”, GitVersion:“v1.14.10-gke.27”, GitCommit:“145f9e21a4515947d6fb10819e5a336aff1b6959”, GitTreeState:“clean”, BuildDate:“2020-02-21T18:01:40Z”, GoVersion:“go1.12.12b4”, Compiler:“gc”, Platform:“linux/amd64”}

Environment: GKE

What happened: When using a client streaming RPC to a gRPC backend, the request backing the RPC will hang when the upstream gRPC service is scaled down to 0 pods. Even when the upstream service is scaled back up, nginx never tries connecting to the new endpoint - it only keeps on trying to reconnect to the old endpoint over and over.

Example logs:

2020/05/08 19:56:31 [error] 45#45: *5354 upstream timed out (110: Operation timed out) while connecting to upstream, client: <client-ip>, server: <server-name>, request: "POST <client-grpc-stream-endpoint-path> HTTP/2.0", upstream: "grpcs://10.13.181.11:8443", host: "<host>"

This log is printed repeatedly. 10.13.181.11 is the IP of the removed pod. This ends after a long time, at least 10 minutes, but I’m not sure if this is a client timeout, a server timeout, or a network interruption between the client and nginx that ends the loop.

What you expected to happen: When the upstream loses all endpoints, I expect nginx to give up attempting to deliver the request upstream, and return an error to the client. It should not be repeatedly attempting to connect to an upstream IP that will never be available again.

What do you think went wrong?: I spent a few hours today to get an understanding of the root cause of this issue. The problem occurs in the lua code that chooses what upstream node to connect to when nginx attempts to retry forwarding a request. An upstream is chosen by calling a function on a balancer object (balancer.balance()) which returns an IP (and port) to connect to. If this connection fails, it is simply done again. This balancer is obtained from a global map of balancers on first use, but then is cached in the request’s context for easy access in the future. Every second, all existing balancers in the global map are updated with the latest endpoint configurations from k8s. This mutates existing balancers, so even a cached balancer in use for request handling will receive these updates, as long as it is ALSO in the global map. However, if during the secondly sync a backend has no endpoints, the corresponding balancer is not updated: instead it is deleted from the global map altogether. This means that updates to that backend will no longer update this balancer, and if endpoints return for that backend, a new balancer is created and stored in the global map. However, the old balancer is still cached in the request context and being used for that requests retries. Since the balancer is now outdated, the request will never make it to the correct backend, but it will retry over and over and over again.

Short version:

Make a long-lived request
Request cached backend balancer
Remove all endpoints from the backend targetted by the request
Backend balancer is deleted from global map
Failed request to backend is retried, using info from cached, outdated balancer
Go back to step 5

How to reproduce it: A grpc client streaming rpc to a grpc server will reproduce this issue as long as the request lasts long enough for the backend pods to be removed during the request. I know this isn’t the most precise description of how to reproduce, but I hope you have enough information from my description above.

Anything else we need to know: I am happy to make a PR for this if the solution is as simple as removing the balancer caching. If I am wrong about the root cause or desired the solution is more complicated, I am still happy to help solve it, but might require assistance.

/kind bug

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 21 (14 by maintainers)

Commits related to this issue

chart(hookd/ingress): set max attempts for connecting to grpc upstream The default for `grpc_next_upstream_tries` is 0 (infinite), which can result in an RPC to hang indefinitely as the connection is... — committed to nais/deploy by tronghn a year ago

Most upvoted comments

Disappoint to see this closed, think the default should be aligned for both grpc/proxy. The current grpc will retry infinitely till timeout which is both terrible for debug and also spills the nginx log. Found a workaround and if anyone is looking an answer, this will work: "nginx.ingress.kubernetes.io/server-snippet": "grpc_next_upstream off;" similar to the timeout setup

yifeng-cerebras on Oct 31, 2022