go-control-plane: Envoy xDS not updating after 15-20 minutes.

Created xDS GRPC v2 API for envoy as suggested in go-control-plane/pkg/test/main/main.go. Callback, Management Server and Gateway are same, AccessLogServer is removed. It works perfectly and CDS, RDS and LDS are updated successfully.

The problem is from envoyproxy (docker) service. At first, on change of snapshot, GRPC API sends StreamResponse, then envoyproxy does StreamRequest and updates xDS. After ~15mins on change of snapshot, GRPC API still sends StreamResponse but envoyproxy does no StreamRequest and hens no xDS is updated. After this, if I restart envoyproxy, StreamRequest is called, xDS is updated. Problem reappears after ~15mins of each envoy proxy restart.

Something I noticed:

  • func (cb *callbacks) OnStreamResponse(id int64, req *v2.DiscoveryRequest, res *v2.DiscoveryResponse) {} id here changes only when I restart envoyproxy. For a session of envoyproxy it stays same for any number of StreamResponse and StreamRequest. Does it meant to stay same or should it change for each Request?

My envoy_config_yml:

admin:
  access_log_path: /dev/null
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901
dynamic_resources:
  ads_config:
    api_type: GRPC
    refresh_delay: 30s
    cluster_names:
    - xds_cluster
  cds_config:
    ads: {}
  lds_config:
    ads: {}
node:
  cluster: my-cluster
  id: mystack
static_resources:
  clusters:
  - connect_timeout: 1s
    hosts:
    - socket_address:
        address: envoy-discovery-service # docker service name
        port_value: 18000
    http2_protocol_options: {}
    name: xds_cluster
    type: logical_dns
    dns_lookup_family: V4_ONLY
    dns_refresh_rate: 10s

Note: I am using one docker stack with envoyproxy and envoy-discovery-service as 2 different services.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 21 (6 by maintainers)

Most upvoted comments

@dfjones It seems adding keepalive parameters worked. Will be testing in production. If any problem occurs, will reply back.