envoy: gRPC streaming keepAlive ping never fails when proxied through Envoy

Title: gRPC streaming keepAlive ping never fails when proxied through Envoy

Envoy: envoyproxy/envoy-alpine:cd514cc3f1ad82bfd57b6b832b379eb9a2888891 gRPC: grpc-go 1.7.2

Description: I have a Docker setup where I am running Envoy and a gRPC service running in a single container. Envoy is proxying port 80 to port 8000 where the service is listening. The gRPC has a server->client unidirectional streaming endpoint that has keepAlive enabled so that if a client ever disconnects ungracefully, they won’t leave a hanging connection. When I connect to my service directly and Ctrl-Z my test client, in ~30 seconds the server notices that a keepAlive HTTP/2 PING has failed, so it closes the connection. When I connect to my service through Envoy and Ctrl-Z my test client, the connection hangs forever.

I test this locally by running my docker container, and then from my local machine I first point my gRPC test client to port 8000 to bypass Envoy. I get the following results on Wireshark on the docker0 interface:

port8000_cropped

At the end, there are 3 groups of 3 TCP frames at 55, 85, and 115 seconds on port 8000. These are obviously the keepAlive HTTP/2 PINGs.

Here is what happens when I go through Envoy on port 80:

port80_cropped

Here I see the actual HTTP/2, but it’s only on the initial connection. No matter how long I listen, I never see any keepAlive frames. I assume my service is still sending the keepAlive PINGs to Envoy on the docker container’s loopback interface, but I don’t know an easy way to capture that.

gRPC KeepAlive Go config:

keepAliveOpt := grpc.KeepaliveParams(keepalive.ServerParameters{
	MaxConnectionIdle:     infinity,
	MaxConnectionAge:      infinity,
	MaxConnectionAgeGrace: infinity,
	Time:    25 * time.Second,
	Timeout: 5 * time.Second,
})

keepAliveEnforcementPolicyOpt := grpc.KeepaliveEnforcementPolicy(keepalive.EnforcementPolicy{
	MinTime:             5 * time.Minute,
	PermitWithoutStream: false,
})

Envoy config:

Notice that I have a separate route for my streaming endpoint, because I needed to make the timeout_ms: 0

{
  "listeners": [
    {
      "address": "tcp://0.0.0.0:80",
      "filters": [
        {
          "type": "read",
          "name": "http_connection_manager",
          "config": {
            "codec_type": "auto",
            "stat_prefix": "ingress_http",
            "route_config": {
              "virtual_hosts": [
                {
                  "name": "local_service",
                  "domains": ["*"],
                  "routes": [
                    {
                      "timeout_ms": 0,
                      "prefix": "/gprc.prefix.to.my.streaming/Endpoint",
                      "headers": [
                        {"name": "content-type", "value": "application/grpc"}
                      ],
                      "cluster": "local_service_grpc",
                      "retry_policy": {
                        "retry_on": "5xx",
                        "num_retries": 3
                      }
                    },
                    {
                      "timeout_ms": 10000,
                      "prefix": "/",
                      "headers": [
                        {"name": "content-type", "value": "application/grpc"}
                      ],
                      "cluster": "local_service_grpc",
                      "retry_policy": {
                        "retry_on": "5xx",
                        "num_retries": 3
                      }
                    },
                    {
                      "timeout_ms": 10000,
                      "prefix": "/",
                      "cluster": "local_service_http"
                    }
                  ]
                }
              ]
            },
            "filters": [
              {
                "type": "decoder",
                "name": "router",
                "config": {}
              },
              {
                "type": "both",
                "name": "health_check",
                "config": {
                  "pass_through_mode": true,
                  "endpoint": "/healthcheck"
                }
              }
            ]
          }
        }
      ]
    }
  ],
  "admin": {
    "access_log_path": "/dev/null",
    "address": "tcp://0.0.0.0:8001"
  },
  "cluster_manager": {
     "clusters": [
      {
        "name": "local_service_grpc",
        "connect_timeout_ms": 10000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "features": "http2",
        "hosts": [
          {
            "url": "tcp://127.0.0.1:8000"
          }
        ]
      },
      {
        "name": "local_service_http",
        "connect_timeout_ms": 10000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://127.0.0.1:8000"
          }
        ]
      }
    ],
  }
}

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 15 (11 by maintainers)

Commits related to this issue

Merge pull request #2086 from hklai/1.1-master Merge release-1.1 master — committed to Shikugawa/envoy by hklai 5 years ago

Most upvoted comments

Though the problem here is that my service thinks that the request is coming from Envoy, so the PING goes to Envoy, but it is pretty clear that it is intended for the client. Am I getting that wrong?

cdelguercio on Nov 21, 2017