etcd: It's possible to get "wal: max entry size limit exceeded" with recommended values
What happened?
I run clean etcd node as followed:
etcd --max-request-bytes=10485760
Then through the go client I put
to key k1
value consists of 10*1024*1024-27
bytes.
Then I stopped server and try to start it again, but it failed with error wal: max entry size limit exceeded
(https://github.com/etcd-io/etcd/blob/main/server/storage/wal/decoder.go#L88).
What did you expect to happen?
- It should not be possible to write more data than etcd can read then.
- It should be obvious how much data (key + value) in bytes I’m able to write. I set --max-request-bytes to 10mb, but in fact was able to pass 25bytes less data (2bytes to key, 10mb-27bytes for data). Maybe internal overhead should not be part of validation?
- It should not be possible to set --max-request-bytes to value so high that etcd will allow to write more than it can read then. OR the WAL limit should be configurable.
How can we reproduce it (as minimally and precisely as possible)?
Run clean etcd instance:
etcd --max-request-bytes=10485760
Run this go-code:
package main
import (
"context"
"go.etcd.io/etcd/clientv3"
)
func main() {
length := 10*1024*1024 - 27
b := make([]byte, length)
for i := 0; i < length; i++ {
b[i] = 'a'
}
cli, err := clientv3.New(clientv3.Config{
Endpoints: []string{"http://127.0.0.1:2379"},
MaxCallSendMsgSize: length + 1024,
})
if err != nil {
panic(err)
}
_, err = cli.Put(context.Background(), "k1", string(b))
if err != nil {
panic(err)
}
}
Anything else we need to know?
Let’s assume you run code above, you will not be able to restart server.
But you can call etcd snap save
while server is still running, delete all WAL files and the start server and receive saved value.
Etcd version (please run commands below)
$ etcd --version
etcd Version: 3.5.4
Git SHA: 08407ff76
Go Version: go1.18.1
Go OS/Arch: darwin/amd64
$ etcdctl version
etcdctl version: 3.5.4
API version: 3.5
Etcd configuration (command line flags or environment variables)
–max-request-bytes=10485760
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
+------------------+---------+---------+-----------------------+-----------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+---------+-----------------------+-----------------------+------------+
| 8e9e05c52164694d | started | default | http://localhost:2380 | http://localhost:2379 | false |
+------------------+---------+---------+-----------------------+-----------------------+------------+
$ etcdctl --endpoints=<member list> endpoint status -w table
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://localhost:2379 | 8e9e05c52164694d | 3.5.4 | 25 kB | true | false | 2 | 4 | 4 | |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Relevant log output
{"level":"fatal","ts":"2022-05-09T13:19:04.630+0300","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"wal: max entry size limit exceeded","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/private/tmp/etcd-20220424-57243-1ka6pvw/server/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/private/tmp/etcd-20220424-57243-1ka6pvw/server/etcdmain/main.go:40\nmain.main\n\t/private/tmp/etcd-20220424-57243-1ka6pvw/server/main.go:32\nruntime.main\n\t/usr/local/Cellar/go/1.18.1/libexec/src/runtime/proc.go:250"}
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 22 (17 by maintainers)
Thanks for all the feedback, which basically makes sense to me. I agree we should never bring down the etcd server.
Proposed actions:
cc @ptabor @serathius @spzala @xiang90 @gyuho for opinions.
@ahrtr I think there are several concerning issues revealed by this bug report:
An etcd instance that is configured with the suggested values can be brought to silently rot its wal by an adversarial client, to the point that nodes are not able to restart, due to not being able to read the entry from the wal. Solution: If there is a limit on the entry size for the wal, then make it a hard limit on the write path as well, so that the roting is not possible, nor silent.
The wal entry limit of 10Mb is not documented anywhere, and in turn translates into a limit of 10Mb per transaction. Solution: document this limit.
From my understanding the limit on the wal decoder has been introduced in 2020 in an unrelated diff (https://github.com/etcd-io/etcd/pull/11793#discussion_r413365781), and it is not technically necessary. If we want to impose such limit anyway, then I would make it configurable…unless I’m misunderstanding and there is an actual technical limitation. Solution: make the limit configurable.
I’m pretty sure that once you start enforcing this limit on the write path (1), a lot of production installments will start seeing transaction failures which are now hidden. (2) and (3) are the solutions for the problems exposed by (1). Another solution to (1-3) is to remove the limit altogether.
The fix will be included in 3.5.5 and 3.6.0.
It’s in my to do list. I will get this sorted out and ask for opinions from other maintainers and users sometime later.
I like the idea around capping the limits around SegmentSizeBytes.
refuse configs where --max-request-bytes > SegmentSizeBytes/4=16MB
Move the decoding safety check to comparison against
SegmentSizeBytes
.Since the WAL file limitation is 64MB, so the simplest solution could be just to use the SegmentSizeBytes as the each WAL entry’s limitation directly?