ingress-nginx: gRPC error handling

NGINX currently does not take into account the backend protocol when issuing errors.

We’re running a gRPC based Ingress using nginx.ingress.kubernetes.io/backend-protocol: "GRPC". I was wondering how to get an UNAVAILABLE error right if eg. the upstream pod is not there. Currently if this happens the NGINX returns a HTTP 503 with some HTML which of course cannot be understood by an gRPC client even though eg. the Go Client has some fallback mapping in place (https://github.com/grpc/grpc-go/blob/40ed2eb467471df2bd3c59e66cc5357159062d48/internal/transport/http_util.go#L304):

<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx</center>
</body>
</html>

There is an article about NGINX Plus talking about this issue https://dzone.com/articles/deploying-nginx-plus-as-an-api-gateway-part-3-publ (Responding to Errors). I also found the corresponding gist https://gist.github.com/nginx-gists/87ed942d4ee9f7e7ebb2ccf757ed90be.

When adding this error handling to our gRPC Ingress using nginx.ingress.kubernetes.io/server-snippet the issue is gone. I’ve tested it with upstream pods scaled to 0:

Before

curl -i --http2 -H "Content-Type: application/grpc" https://******
HTTP/2 503
date: Wed, 27 May 2020 16:03:51 GMT
content-type: text/html
content-length: 190
strict-transport-security: max-age=15724800; includeSubDomains

<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx</center>
</body>
</html>

After

curl -i --http2 -H "Content-Type: application/grpc" https://******
HTTP/2 204
date: Wed, 27 May 2020 16:02:09 GMT
grpc-status: 14
grpc-message: unavailable

Is there a reason for not adding this kind of error handling to the server section if the backend protocol is set to GRPC?

/triage support

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 3
Comments: 25 (9 by maintainers)

Most upvoted comments

For me it worked using the parts from the gist:

nginx.ingress.kubernetes.io/server-snippet: |
    # Standard HTTP-to-gRPC status code mappings
    # Ref: https://github.com/grpc/grpc/blob/master/doc/http-grpc-status-mapping.md
    #
    error_page 400 = @grpc_internal;
    error_page 401 = @grpc_unauthenticated;
    error_page 403 = @grpc_permission_denied;
    error_page 404 = @grpc_unimplemented;
    error_page 429 = @grpc_unavailable;
    error_page 502 = @grpc_unavailable;
    error_page 503 = @grpc_unavailable;
    error_page 504 = @grpc_unavailable;
    # NGINX-to-gRPC status code mappings
    # Ref: https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
    #
    error_page 405 = @grpc_internal; # Method not allowed
    error_page 408 = @grpc_deadline_exceeded; # Request timeout
    error_page 413 = @grpc_resource_exhausted; # Payload too large
    error_page 414 = @grpc_resource_exhausted; # Request URI too large
    error_page 415 = @grpc_internal; # Unsupported media type;
    error_page 426 = @grpc_internal; # HTTP request was sent to HTTPS port
    error_page 495 = @grpc_unauthenticated; # Client certificate authentication error
    error_page 496 = @grpc_unauthenticated; # Client certificate not presented
    error_page 497 = @grpc_internal; # HTTP request was sent to mutual TLS port
    error_page 500 = @grpc_internal; # Server error
    error_page 501 = @grpc_internal; # Not implemented
    # gRPC error responses
    # Ref: https://github.com/grpc/grpc-go/blob/master/codes/codes.go
    #
    location @grpc_deadline_exceeded {
        add_header grpc-status 4;
        add_header grpc-message 'deadline exceeded';
        return 204;
    }
    location @grpc_permission_denied {
        add_header grpc-status 7;
        add_header grpc-message 'permission denied';
        return 204;
    }
    location @grpc_resource_exhausted {
        add_header grpc-status 8;
        add_header grpc-message 'resource exhausted';
        return 204;
    }
    location @grpc_unimplemented {
        add_header grpc-status 12;
        add_header grpc-message unimplemented;
        return 204;
    }
    location @grpc_internal {
        add_header grpc-status 13;
        add_header grpc-message 'internal error';
        return 204;
    }
    location @grpc_unavailable {
        add_header grpc-status 14;
        add_header grpc-message unavailable;
        return 204;
    }
    location @grpc_unauthenticated {
        add_header grpc-status 16;
        add_header grpc-message unauthenticated;
        return 204;
    }
    default_type application/grpc;   # Ensure gRPC for all error responses

ecktom on May 27, 2020

I also noticed that gRPC status codes for requests are not reflected by the prometheus metrics. I guess they should be included separately or getting mapped as the requests currently always show a HTTP 200 no matter if they contain a gRPC error status code

ecktom on May 28, 2020