k3s: Unable to restart HA cluster after etcd size limit reached

Please see logs and additional context in https://github.com/k3s-io/k3s/issues/4787#issuecomment-1071901452

__Originally posted by @ssmall in https://github.com/k3s-io/k3s/issues/4787#issuecomment-1077880193__

To summarize the history and remaining issue, since the previous issue seems to have gotten conflated with a couple different problems:

I have a 3-master, 4-worker HA k3s cluster that went down and began failing to start up on 12 Feb with the error panic: etcdserver: mvcc: database space exceeded (https://github.com/k3s-io/k3s/issues/4787#issuecomment-1037652836)
Following the advice of @brandond I added --etcd-arg=quota-backend-bytes=$((8*1024*1024*1024)) to the startup args for my master nodes (https://github.com/k3s-io/k3s/issues/4787#issuecomment-1039493992)
I also tried pointing a stand-alone etcd at the db directory, however that did not resolve the issue with k3s startup (https://github.com/k3s-io/k3s/issues/4787#issuecomment-1066218285)
Tried a couple fixes provided by @brandond and got a bit farther, however now as soon as I bring up the second master node and the two nodes start talking to each other, it’s back to the original error (https://github.com/k3s-io/k3s/issues/4787#issuecomment-1071083112)

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 27 (12 by maintainers)

Most upvoted comments

Aha! Adding the --secrets-encryption flag back was indeed the final missing piece. The cluster appears to be restored to working order now. Thanks for all your patience and responsiveness. It is great to have everything back as it was.

ssmall on Mar 31, 2022