ingress-nginx: gRPC error handling
NGINX currently does not take into account the backend protocol when issuing errors.
We’re running a gRPC based Ingress using nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
. I was wondering how to get an UNAVAILABLE error right if eg. the upstream pod is not there. Currently if this happens the NGINX returns a HTTP 503 with some HTML which of course cannot be understood by an gRPC client even though eg. the Go Client has some fallback mapping in place (https://github.com/grpc/grpc-go/blob/40ed2eb467471df2bd3c59e66cc5357159062d48/internal/transport/http_util.go#L304):
<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx</center>
</body>
</html>
There is an article about NGINX Plus talking about this issue https://dzone.com/articles/deploying-nginx-plus-as-an-api-gateway-part-3-publ (Responding to Errors). I also found the corresponding gist https://gist.github.com/nginx-gists/87ed942d4ee9f7e7ebb2ccf757ed90be.
When adding this error handling to our gRPC Ingress using nginx.ingress.kubernetes.io/server-snippet
the issue is gone. I’ve tested it with upstream pods scaled to 0:
Before
curl -i --http2 -H "Content-Type: application/grpc" https://******
HTTP/2 503
date: Wed, 27 May 2020 16:03:51 GMT
content-type: text/html
content-length: 190
strict-transport-security: max-age=15724800; includeSubDomains
<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx</center>
</body>
</html>
After
curl -i --http2 -H "Content-Type: application/grpc" https://******
HTTP/2 204
date: Wed, 27 May 2020 16:02:09 GMT
grpc-status: 14
grpc-message: unavailable
Is there a reason for not adding this kind of error handling to the server section if the backend protocol is set to GRPC?
/triage support
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 25 (9 by maintainers)
For me it worked using the parts from the gist:
I also noticed that gRPC status codes for requests are not reflected by the prometheus metrics. I guess they should be included separately or getting mapped as the requests currently always show a HTTP 200 no matter if they contain a gRPC error status code