envoy: CDS Updates with many clusters often fail
This is a performance issue, not a bug per se.
When doing CDS updates with many clusters, Envoy will often get “stuck” evaluating the CDS update. This manifests as EDS failing, and in more extreme cases, the envoy ceases to receive any XDS updates. When this happens, Envoy needs to be restarted to get it updating again.
In our case we’re seeing issues with the current implementation of void CdsApiImpl::onConfigUpdate
with a number of clusters in the 3000-7000 range. If envoy could speedily evaluate a CDS update with 10000 clusters this would represent a HUGE improvement in Envoy’s behavior for us. Right now, only around 2500 clusters in a CDS update seems to evaluate in reasonable amount of time.
Because the function void CdsApiImpl::onConfigUpdate
pauses EDS while doing CDS evaluation, envoy’s config will drift. With many clusters in CDS, this can mean envoy is hundreds of seconds behind what is current, and results in 503’s.
Some context:
- Envoy in my test environment is being run without constraints on 8 core VMs that are running at a max of 30% CPU utilization and max 40% memory (of 32 GB):
resources:
limits:
memory: "32212254720"
requests:
cpu: 100m
memory: 256M
- We’re using Project Contour as our Ingress Controller in K8s.
Contour currently doesn’t use the incremental APIs of Envoy, so when K8s services change in the cluster it sends ALL of the current config again to Envoy, which means a small change like adding or removing a K8s SVC which maps to an envoy cluster results in Envoy having to re-evaluate ALL clusters.
With enough K8s Services behind an Ingress (7000+) envoy can spontaneously cease to receive any new updates indefinitely, and will fail to do EDS because it gets stuck in the CDS evaluation.
Given a high enough number of clusters this would be a Very Hard Problem, but given that we’re in the low thousands, I’m hoping there are some things that could be done to improve performance without resorting to exotic methods.
Please let me know if there’s any information I can provide that could help!
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 29 (23 by maintainers)
Commits related to this issue
- upstream: avoid copies of all cluster endpoints for every resolve target (#15013) Currently Envoy::Upstream::StrictDnsClusterImpl::ResolveTarget when instantiated for every endpoint also creates a fu... — committed to envoyproxy/envoy by rojkov 3 years ago
- upstream: avoid double hashing of protos in CDS init (#15241) Commit Message: upstream: avoid double hashing of protos in CDS init Additional Description: Currently Cluster messages are hashed unco... — committed to envoyproxy/envoy by rojkov 3 years ago
New findings:
Here
secondary_init_clusters_
isstd::list
. In case Envoy loads 10k clusters thissecondary_init_clusters_.remove_if()
line takes 4 sec, in case of 30k it takes about 70 sec. Probably hash map would be a better choice here.Cluster
message are calculated twice. The first time is here and the second time is here. Though performance impact is not that huge. The initial hashing of 30k messages takes 400msec. When an update arrives it takes about 90msec for the same 30k messages.@rojkov this is already supported for SotW (but not delta) xDS, see https://github.com/envoyproxy/envoy/blob/07c4c17be61c77d87d2c108b0775f2e606a7ae12/api/envoy/config/core/v3/config_source.proto#L107.
I’m thinking this is something else we would prefer default true (but need to do a deprecation dance to move to
BoolValue
).@adisuissa this might be another potential cause of buffer bloat in the issue you are looking at.
@ramaraochavali Alright, I’ll drop the close tag from the PR’s description to keep this issue open for now.