etcd: [3.4] panic: runtime error: invalid memory address or nil pointer dereference
I saw this error multiple times in 3.4 pipeline,
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xab2922]
goroutine 3500 [running]:
go.etcd.io/bbolt.(*Tx).Bucket(...)
/home/runner/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/tx.go:101
go.etcd.io/etcd/mvcc/backend.(*concurrentReadTx).UnsafeRange(0xc000598f00, 0x15792dc, 0x3, 0x3, 0xc0000e34e8, 0x11, 0x12, 0x0, 0x0, 0x0, ...)
/home/runner/work/etcd/etcd/mvcc/backend/read_tx.go:195 +0x642
go.etcd.io/etcd/mvcc.(*storeTxnRead).rangeKeys(0xc000598f30, 0xc0001747a8, 0x8, 0x8, 0x0, 0x0, 0x0, 0x67, 0x0, 0x66, ...)
/home/runner/work/etcd/etcd/mvcc/kvstore_txn.go:147 +0x293
go.etcd.io/etcd/mvcc.(*storeTxnRead).Range(0xc000598f30, 0xc0001747a8, 0x8, 0x8, 0x0, 0x0, 0x0, 0x0, 0x66, 0xc00005e800, ...)
/home/runner/work/etcd/etcd/mvcc/kvstore_txn.go:51 +0xaf
go.etcd.io/etcd/mvcc.(*metricsTxnWrite).Range(0xc000598f60, 0xc0001747a8, 0x8, 0x8, 0x0, 0x0, 0x0, 0x0, 0x66, 0x40fa00, ...)
/home/runner/work/etcd/etcd/mvcc/metrics_txn.go:37 +0xaf
go.etcd.io/etcd/mvcc.(*readView).Range(0xc0005d95c0, 0xc0001747a8, 0x8, 0x8, 0x0, 0x0, 0x0, 0x0, 0x66, 0x0, ...)
/home/runner/work/etcd/etcd/mvcc/kv_view.go:39 +0x125
go.etcd.io/etcd/etcdserver/api/v3rpc.(*serverWatchStream).sendLoop(0xc001bb20c0)
/home/runner/work/etcd/etcd/etcdserver/api/v3rpc/watch.go:405 +0x1ef
go.etcd.io/etcd/etcdserver/api/v3rpc.(*watchServer).Watch.func1(0xc001bb20c0)
/home/runner/work/etcd/etcd/etcdserver/api/v3rpc/watch.go:180 +0x2b
created by go.etcd.io/etcd/etcdserver/api/v3rpc.(*watchServer).Watch
/home/runner/work/etcd/etcd/etcdserver/api/v3rpc/watch.go:179 +0x285
FAIL go.etcd.io/etcd/etcdserver/api/v2store 3.484s
FAIL
Refer to https://github.com/etcd-io/etcd/runs/7460637358?check_suite_focus=true
Please see my comment: https://github.com/etcd-io/etcd/issues/14256#issuecomment-1202083883
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 17 (10 by maintainers)
Thanks @JohnJAS and @rtheis for the feedback. It seems that it’s a different issue with this one, I just raised another issue https://github.com/etcd-io/etcd/issues/14402 to track it. Please feel free to deliver a PR for it if you are interested.
@ramses You can go on to see your current issue to avoid us doing repetitive work. cc @ahrtr, if there is any difficulty, I will keep you informed.
This issue can happen on both
main
andrelease-3.4
, and I believerelease-3.5
has this issue as well.The root cause is the etcd just stops immediately before (*readView) Range is called, so the tx is nil; accordingly the following range operation panics.
I managed to reproduce this issue intentionally by simulating the situation mentioned above. The code change (on 3.4) is below,
Then the issue can always be reproduced when running test
TestV3WatchWithPrevKV
,The possibility of running into this issue is really low in production environment. It’s even harder to reproduce this issue in the pipeline after merging https://github.com/etcd-io/etcd/pull/14290 . Since the etcdserver should have already been stopped when running into this issue, it means the last transaction should have already been committed, so it should be safe. So I don’t think it’s a blocker for 3.4.20 any more.
I will think about a formal fix or refactoring in future.
I suggest to resolve this issue firstly. @SimFG @ramses
Comment on https://github.com/etcd-io/etcd/issues/14143,
@ahrtr Ok, I will look it.