etcd: It's possible to get "wal: max entry size limit exceeded" with recommended values

What happened?

I run clean etcd node as followed: etcd --max-request-bytes=10485760

Then through the go client I put to key k1 value consists of 10*1024*1024-27 bytes.

Then I stopped server and try to start it again, but it failed with error wal: max entry size limit exceeded (https://github.com/etcd-io/etcd/blob/main/server/storage/wal/decoder.go#L88).

What did you expect to happen?

  1. It should not be possible to write more data than etcd can read then.
  2. It should be obvious how much data (key + value) in bytes I’m able to write. I set --max-request-bytes to 10mb, but in fact was able to pass 25bytes less data (2bytes to key, 10mb-27bytes for data). Maybe internal overhead should not be part of validation?
  3. It should not be possible to set --max-request-bytes to value so high that etcd will allow to write more than it can read then. OR the WAL limit should be configurable.

How can we reproduce it (as minimally and precisely as possible)?

Run clean etcd instance: etcd --max-request-bytes=10485760

Run this go-code:

package main

import (
	"context"
	"go.etcd.io/etcd/clientv3"
)

func main() {
	length := 10*1024*1024 - 27
	b := make([]byte, length)
	for i := 0; i < length; i++ {
		b[i] = 'a'
	}

	cli, err := clientv3.New(clientv3.Config{
		Endpoints:          []string{"http://127.0.0.1:2379"},
		MaxCallSendMsgSize: length + 1024,
	})

	if err != nil {
		panic(err)
	}

	_, err = cli.Put(context.Background(), "k1", string(b))
	if err != nil {
		panic(err)
	}
}

Anything else we need to know?

Let’s assume you run code above, you will not be able to restart server. But you can call etcd snap save while server is still running, delete all WAL files and the start server and receive saved value.

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.5.4
Git SHA: 08407ff76
Go Version: go1.18.1
Go OS/Arch: darwin/amd64

$ etcdctl version
etcdctl version: 3.5.4
API version: 3.5

Etcd configuration (command line flags or environment variables)

–max-request-bytes=10485760

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
+------------------+---------+---------+-----------------------+-----------------------+------------+
|        ID        | STATUS  |  NAME   |      PEER ADDRS       |     CLIENT ADDRS      | IS LEARNER |
+------------------+---------+---------+-----------------------+-----------------------+------------+
| 8e9e05c52164694d | started | default | http://localhost:2380 | http://localhost:2379 |      false |
+------------------+---------+---------+-----------------------+-----------------------+------------+

$ etcdctl --endpoints=<member list> endpoint status -w table
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|       ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://localhost:2379 | 8e9e05c52164694d |   3.5.4 |   25 kB |      true |      false |         2 |          4 |                  4 |        |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Relevant log output

{"level":"fatal","ts":"2022-05-09T13:19:04.630+0300","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"wal: max entry size limit exceeded","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/private/tmp/etcd-20220424-57243-1ka6pvw/server/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/private/tmp/etcd-20220424-57243-1ka6pvw/server/etcdmain/main.go:40\nmain.main\n\t/private/tmp/etcd-20220424-57243-1ka6pvw/server/main.go:32\nruntime.main\n\t/usr/local/Cellar/go/1.18.1/libexec/src/runtime/proc.go:250"}

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 22 (17 by maintainers)

Most upvoted comments

Thanks for all the feedback, which basically makes sense to me. I agree we should never bring down the etcd server.

Proposed actions:

  1. Make the WAL entry limit configurable, and defaults to 10 * requestLimit;
  2. or remove the WAL entry limitation on WAL?
  3. Update the document, at least we need to clearly explain the fields in help.go

cc @ptabor @serathius @spzala @xiang90 @gyuho for opinions.

@ahrtr I think there are several concerning issues revealed by this bug report:

  1. An etcd instance that is configured with the suggested values can be brought to silently rot its wal by an adversarial client, to the point that nodes are not able to restart, due to not being able to read the entry from the wal. Solution: If there is a limit on the entry size for the wal, then make it a hard limit on the write path as well, so that the roting is not possible, nor silent.

  2. The wal entry limit of 10Mb is not documented anywhere, and in turn translates into a limit of 10Mb per transaction. Solution: document this limit.

  3. From my understanding the limit on the wal decoder has been introduced in 2020 in an unrelated diff (https://github.com/etcd-io/etcd/pull/11793#discussion_r413365781), and it is not technically necessary. If we want to impose such limit anyway, then I would make it configurable…unless I’m misunderstanding and there is an actual technical limitation. Solution: make the limit configurable.

I’m pretty sure that once you start enforcing this limit on the write path (1), a lot of production installments will start seeing transaction failures which are now hidden. (2) and (3) are the solutions for the problems exposed by (1). Another solution to (1-3) is to remove the limit altogether.

The fix will be included in 3.5.5 and 3.6.0.

It’s in my to do list. I will get this sorted out and ask for opinions from other maintainers and users sometime later.

I like the idea around capping the limits around SegmentSizeBytes.

  1. refuse configs where --max-request-bytes > SegmentSizeBytes/4=16MB

  2. Move the decoding safety check to comparison against SegmentSizeBytes.

Since the WAL file limitation is 64MB, so the simplest solution could be just to use the SegmentSizeBytes as the each WAL entry’s limitation directly?