k3s: Embedded etcd server does not account for exceeding database space

Describe the bug: When you ran k3s long enough with etcd store, you are probably going to see this:

Flag --insecure-port has been deprecated, This flag has no effect now and will be removed in v1.24.{"level":"warn","ts":"2021-12-19T17:34:48.509+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc01a9ac000/#initially=[https://127.0.0.1:2379]","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
panic: etcdserver: mvcc: database space exceeded
goroutine 417 [running]:
github.com/rancher/k3s/pkg/cluster.(*Cluster).Start.func1(0xc0336094a0, 0x5959b78, 0xc000936900, 0xc00159cc80)
        /go/src/github.com/rancher/k3s/pkg/cluster/cluster.go:103 +0x1e5created by github.com/rancher/k3s/pkg/cluster.(*Cluster).Start
        /go/src/github.com/rancher/k3s/pkg/cluster/cluster.go:98 +0x6bf

Steps To Reproduce: Find a k3s server with large enough etcd store

Expected behavior: k3s should automatically compact etcd store and continue as usual; if not, start a emergency etcd server that allow the operators to do some rescue work.

Actual behavior: It just crash, so I can’t even do manual compaction myself

Backporting

  • Needs backporting to older releases

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 45 (19 by maintainers)

Most upvoted comments

Reproduced the issue in k3s with version v1.23.4+k3s1

  • Install k3s
curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true INSTALL_K3S_VERSION=v1.23.4+k3s1 sh -
  • Start k3s with etcd quota 1MB
sudo k3s server --cluster-init --etcd-arg=quota-backend-bytes=$((1*1024*1024))
  • Failure observed on start
...
{"level":"info","ts":"2022-03-23T14:10:05.512Z","caller":"etcdserver/quota.go:117","msg":"enabled backend quota","quota-name":"v3-applier","quota-size-bytes":1048576,"quota-size":"1.0 MB"}
...
{"level":"warn","ts":"2022-03-23T14:10:29.317Z","caller":"etcdserver/apply.go:737","msg":"alarm raised","alarm":"NOSPACE","from":"30bcbaaff266920a"}
{"level":"warn","ts":"2022-03-23T14:10:29.317Z","logger":"etcd-client","caller":"v3@v3.5.1-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001a428c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
E0323 14:10:29.317817   19730 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0x8, desc:"etcdserver: mvcc: database space exceeded"}: etcdserver: mvcc: database space exceeded
FATA[0024] flannel exited: failed to acquire lease: etcdserver: mvcc: database space exceeded
  • Start k3s with etcd quota 1GB
sudo k3s server --cluster-init --etcd-arg=quota-backend-bytes=$((1024*1024*1024))
  • Failure observed on start
...
{"level":"info","ts":"2022-03-23T14:11:06.781Z","caller":"etcdserver/quota.go:117","msg":"enabled backend quota","quota-name":"v3-applier","quota-size-bytes":1073741824,"quota-size":"1.1 GB"}
...
{"level":"warn","ts":"2022-03-23T14:11:22.373Z","logger":"etcd-client","caller":"v3@v3.5.1-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000ca7880/127.0.0.1:2379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
panic: etcdserver: mvcc: database space exceeded

Verified the fix in k3s with v1.23.5-rc1+k3s1

  • Install k3s
curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true INSTALL_K3S_VERSION=v1.23.5-rc1+k3s1 sh -
  • Start k3s with etcd quota 1MB
sudo k3s server --cluster-init --etcd-arg=quota-backend-bytes=$((1*1024*1024))
  • Failure observed on start
...
{"level":"info","ts":"2022-03-23T14:12:34.471Z","caller":"etcdserver/quota.go:117","msg":"enabled backend quota","quota-name":"v3-applier","quota-size-bytes":1048576,"quota-size":"1.0 MB"}
...
{"level":"warn","ts":"2022-03-23T14:12:58.063Z","logger":"etcd-client","caller":"v3@v3.5.1-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00140ba40/127.0.0.1:2379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
E0323 14:12:58.064279   20464 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0x8, desc:"etcdserver: mvcc: database space exceeded"}: etcdserver: mvcc: database space exceeded
FATA[0023] flannel exited: failed to acquire lease: etcdserver: mvcc: database space exceeded
  • Start k3s with etcd quota 1GB
sudo k3s server --cluster-init --etcd-arg=quota-backend-bytes=$((1024*1024*1024))
  • Observed success on start BUT the argument is not getting set as defined and is being defaulted to 2.1GB
{"level":"info","ts":"2022-03-23T14:13:06.434Z","caller":"etcdserver/quota.go:94","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}

Please advice on this observation @brandond