kubernetes: etcd3 watcher doesn't scale

I did some scalability tests with etcd3 enabled and it seems that watcher.go pretty much doesn’t scale.

I think the problem we have is with tranform function: https://github.com/kubernetes/kubernetes/blob/master/pkg/storage/etcd3/watcher.go#L234

So to give you some numbers, in 2000-node kubemark, I started cluster at:

I0928 09:48:36.043709    3547 config.go:404] Will report 10.240.0.24 as public IP address.
I0928 09:48:36.045790    3547 server.go:328] Initalizing cache sizes based on 120000MB limit

and 12 second later I started getting:

W0928 09:48:48.398584    3547 watcher.go:319] Fast watcher, slow processing. Number of buffered events: 100.Probably caused by slow decoding, user not receiving fast, or other processing logic

I have millions of such lines in my logs.

Note that this was empty cluster - there weren’t any pods at that time in the system - so it was mostly traffic from nodes.

This is pretty much a blocker for launching etcd3 - we can’t launch with worse performance than etcd2

@kubernetes/sig-scalability @xiang90 @hongchaodeng @timothysc

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 45 (45 by maintainers)

Commits related to this issue

Merge pull request #34089 from wojtek-t/with_serializable Automatic merge from submit-queue Make gets for previous value in watch serializable Ref #33653 — committed to kubernetes/kubernetes by deleted user 8 years ago
Merge pull request #34246 from hongchaodeng/etcddep Automatic merge from submit-queue etcd3: use PrevKV to remove additional get ref: #https://github.com/kubernetes/kubernetes/issues/33653 We ar... — committed to kubernetes/kubernetes by deleted user 8 years ago
Merge pull request #34435 from wojtek-t/avoid_unnecessary_decoding Automatic merge from submit-queue Avoid unnecessary decoding in etcd3 client Ref https://github.com/kubernetes/kubernetes/issues/3... — committed to kubernetes/kubernetes by deleted user 8 years ago

Most upvoted comments

@xiang90 @hongchaodeng I did the experiment with your PR: #34246 and it is waaaaaaaaaaaaaaay better. And I’m talking both about metrics, but also about the logs that I mentioned before:

wojtekt@wojtekt-work:~/Downloads$ cat apiserver.txt | grep "watcher.go:319" | wc -l
120681

vs this one with your changes:

wojtekt@kubernetes-kubemark-master:~$ cat /var/log/kube-apiserver.log | grep "watcher.go:327" | wc -l
3478

And what is more, I forgot to switch on protobufs which is making things even faster.

So backporting this feature to 3.0.x would really help.

wojtek-t on Oct 7, 2016