go-control-plane: [BUG] send EDS before CDS, CDS will not resubscribe

sotw server says send xds out of order.most of time CDS will request first, so it response at first. but some times EDS will send before CDS

		// config watcher can send the requested resources types in any order
		case resp, more := <-values.endpoints:
			if !more {
				return status.Errorf(codes.Unavailable, "endpoints watch failed")
			}
			nonce, err := send(resp)
			if err != nil {
				return err
			}
			values.endpointNonce = nonce

		case resp, more := <-values.clusters:
			if !more {
				return status.Errorf(codes.Unavailable, "clusters watch failed")
			}
			nonce, err := send(resp)
			if err != nil {
				return err
			}
			values.clusterNonce = nonce

https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#resource-warming but enovyproxy docs says

EDS updates (if any) must arrive after CDS updates for the respective clusters.

if CDS pushed and EDS was pushed brefore CDS, the pushed CDS config would not working. We are running 40 nodes(envoyproxy v1.16.4) cluters sub xds to a single management server.7.5% nodes will push EDS brefore CDS when Snapshot changed, EDS will sub twice,and CDS will not sub.if we subsequently changed the cluster’s resources , the cds will not push anymore.

you can add this code to stow server, it will reproduce send eds brefore cds.

	go func() error {
		tick := time.NewTicker(time.Second * 3)
		tick2 := time.NewTicker(time.Second * 1)
		for {
			select {
			case resp, more := <-values.endpoints:
				if !more {
					return status.Errorf(codes.Unavailable, "endpoints watch failed")
				}
				_ = resp
				nonce, err := send(resp)
				if err != nil {
					return err
				}
				values.endpointNonce = nonce
			case <-tick.C:

			}
			select {
			case resp, more := <-values.clusters:
				if !more {
					return status.Errorf(codes.Unavailable, "clusters watch failed")
				}
				nonce, err := send(resp)
				if err != nil {
					return err
				}
				values.clusterNonce = nonce
			case <-tick2.C:
				tick.Reset(time.Second * 3)
			}
		}
	}()

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 33 (30 by maintainers)

Most upvoted comments

+1 hoping to see this get fixed because we run into the same issue. It looks like #544 hasn’t been updated for a long time. Is the review in a state that I can pull and test with our application?