kubernetes: etcd3 watcher doesn't scale
I did some scalability tests with etcd3 enabled and it seems that watcher.go pretty much doesn’t scale.
I think the problem we have is with tranform
function:
https://github.com/kubernetes/kubernetes/blob/master/pkg/storage/etcd3/watcher.go#L234
So to give you some numbers, in 2000-node kubemark, I started cluster at:
I0928 09:48:36.043709 3547 config.go:404] Will report 10.240.0.24 as public IP address.
I0928 09:48:36.045790 3547 server.go:328] Initalizing cache sizes based on 120000MB limit
and 12 second later I started getting:
W0928 09:48:48.398584 3547 watcher.go:319] Fast watcher, slow processing. Number of buffered events: 100.Probably caused by slow decoding, user not receiving fast, or other processing logic
I have millions of such lines in my logs.
Note that this was empty cluster - there weren’t any pods at that time in the system - so it was mostly traffic from nodes.
This is pretty much a blocker for launching etcd3 - we can’t launch with worse performance than etcd2
@kubernetes/sig-scalability @xiang90 @hongchaodeng @timothysc
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 45 (45 by maintainers)
Commits related to this issue
- Merge pull request #34089 from wojtek-t/with_serializable Automatic merge from submit-queue Make gets for previous value in watch serializable Ref #33653 — committed to kubernetes/kubernetes by deleted user 8 years ago
- Merge pull request #34246 from hongchaodeng/etcddep Automatic merge from submit-queue etcd3: use PrevKV to remove additional get ref: #https://github.com/kubernetes/kubernetes/issues/33653 We ar... — committed to kubernetes/kubernetes by deleted user 8 years ago
- Merge pull request #34435 from wojtek-t/avoid_unnecessary_decoding Automatic merge from submit-queue Avoid unnecessary decoding in etcd3 client Ref https://github.com/kubernetes/kubernetes/issues/3... — committed to kubernetes/kubernetes by deleted user 8 years ago
@xiang90 @hongchaodeng I did the experiment with your PR: #34246 and it is waaaaaaaaaaaaaaay better. And I’m talking both about metrics, but also about the logs that I mentioned before:
vs this one with your changes:
And what is more, I forgot to switch on protobufs which is making things even faster.
So backporting this feature to 3.0.x would really help.