nats-server: panic: runtime error: makeslice: cap out of range
Defect
Make sure that these boxes are checked before submitting your issue – thank you!
- Included
nats-server -DV
output - Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)
Versions of nats-server
and affected client libraries used:
[54] 2022/05/10 03:53:10.909500 [INF] Starting nats-server
[54] 2022/05/10 03:53:10.909527 [INF] Version: 2.8.0
[54] 2022/05/10 03:53:10.909536 [INF] Git: [90721ee]
[54] 2022/05/10 03:53:10.909544 [DBG] Go build: go1.17.9
[54] 2022/05/10 03:53:10.909553 [INF] Name: NCFLZXZ6MUOIW6H4BRRNMQMOHFYAENP6OEU3XUYYKU3WHHE4ZRDPFRBS
[54] 2022/05/10 03:53:10.909572 [INF] ID: NCFLZXZ6MUOIW6H4BRRNMQMOHFYAENP6OEU3XUYYKU3WHHE4ZRDPFRBS
[54] 2022/05/10 03:53:10.909595 [DBG] Created system account: "$SYS"
Messages were sent and consumed using nats.py 2.1.0.
OS/Container environment:
GKE
Steps or code to reproduce the issue:
- Install NATS with Helm, enable jetstream filestorage – Stored on a volume with a storage class type=pd-ssd and WaitForFirstConsumer
- Send tens of thousands of messages without consuming them.
- Attempt to scale down the node that nats is running on for an upgrade and recover on a new node.
Expected result:
Not to panic, to recover gracefully, or at least to provide a more instructive error message.
Actual result:
[3426] 2022/05/10 03:10:51.017008 [INF] Starting nats-server
[3426] 2022/05/10 03:10:51.017092 [INF] Version: 2.8.0
[3426] 2022/05/10 03:10:51.017096 [INF] Git: [90721ee]
[3426] 2022/05/10 03:10:51.017100 [INF] Name: queue-nats-0
[3426] 2022/05/10 03:10:51.017110 [INF] Node: 4pcPsiNg
[3426] 2022/05/10 03:10:51.017116 [INF] ID: NC6TCJ2D5KDDFEJ37JNOLTZIBEDERC6T3MMGN6ZMVNHDGFW6IM3ETFIJ
[3426] 2022/05/10 03:10:51.017146 [INF] Using configuration file: /etc/nats-config/nats.conf
[3426] 2022/05/10 03:10:51.018272 [INF] Starting http monitor on 0.0.0.0:8222
[3426] 2022/05/10 03:10:51.018355 [INF] Starting JetStream
[3426] 2022/05/10 03:10:51.018534 [INF] _ ___ _____ ___ _____ ___ ___ _ __ __
[3426] 2022/05/10 03:10:51.018538 [INF] _ | | __|_ _/ __|_ _| _ \ __| /_\ | \/ |
[3426] 2022/05/10 03:10:51.018540 [INF] | || | _| | | \__ \ | | | / _| / _ \| |\/| |
[3426] 2022/05/10 03:10:51.018541 [INF] \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_| |_|
[3426] 2022/05/10 03:10:51.018543 [INF]
[3426] 2022/05/10 03:10:51.018545 [INF] https://docs.nats.io/jetstream
[3426] 2022/05/10 03:10:51.018546 [INF]
[3426] 2022/05/10 03:10:51.018548 [INF] ---------------- JETSTREAM ----------------
[3426] 2022/05/10 03:10:51.018556 [INF] Max Memory: 1.00 GB
[3426] 2022/05/10 03:10:51.018559 [INF] Max Storage: 100.00 GB
[3426] 2022/05/10 03:10:51.018561 [INF] Store Directory: "/data/jetstream"
[3426] 2022/05/10 03:10:51.018562 [INF] -------------------------------------------
[3426] 2022/05/10 03:10:51.321612 [INF] Restored 13,833 messages for stream '$G > INGEST'
[3426] 2022/05/10 03:10:51.421122 [INF] Server is ready
panic: runtime error: makeslice: cap out of range
goroutine 1 [running]:
github.com/nats-io/nats-server/v2/server.(*msgBlock).indexCacheBuf(0xc000292a80, {0xc00078c000, 0x186a22, 0x200000})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:3165 +0x195
github.com/nats-io/nats-server/v2/server.(*msgBlock).loadMsgsWithLock(0xc000292a80)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:3466 +0x414
github.com/nats-io/nats-server/v2/server.(*msgBlock).generatePerSubjectInfo(0xc000292a80)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:4663 +0xf0
github.com/nats-io/nats-server/v2/server.(*msgBlock).readPerSubjectInfo(0xc000292a80)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:4716 +0x3b4
github.com/nats-io/nats-server/v2/server.(*fileStore).recoverMsgBlock(0xc000290280, {0xb1de38, 0xc000709790}, 0x3d)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:673 +0xff0
github.com/nats-io/nats-server/v2/server.(*fileStore).recoverMsgs(0xc000290280)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:958 +0x8bd
github.com/nats-io/nats-server/v2/server.newFileStoreWithCreated({{0xc000220990, 0x25}, _, _, _, _}, {{0xc000759020, 0xa}, {0x0, 0x0}, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:315 +0x76c
github.com/nats-io/nats-server/v2/server.(*stream).setupStore(0xc00011c2c0, 0xc000160988)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/stream.go:2760 +0x2d5
github.com/nats-io/nats-server/v2/server.(*Account).addStreamWithAssignment(0xc000118900, 0xc0000f6108, 0x0, 0x0)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/stream.go:417 +0xee5
github.com/nats-io/nats-server/v2/server.(*Account).addStream(0xc000759030, 0xc000758ff0)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/stream.go:276 +0x1d
github.com/nats-io/nats-server/v2/server.(*Account).EnableJetStream(0xc000118900, 0xc00001f8c0)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:1193 +0x2ecd
github.com/nats-io/nats-server/v2/server.(*Server).configJetStream(0xc0001ac000, 0xc000118900)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:645 +0x9c
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamAccounts(0xc0000d0160)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:575 +0xb4
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStream(0xc0001ac000, {0x40000000, 0x1900000000, {0xc0000c23e0, 0xf}, {0x0, 0x0}})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:392 +0x919
github.com/nats-io/nats-server/v2/server.(*Server).EnableJetStream(0xc0001ac000, 0xc000161e38)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:204 +0x40c
github.com/nats-io/nats-server/v2/server.(*Server).Start(0xc0001ac000)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:1746 +0xf10
github.com/nats-io/nats-server/v2/server.Run(...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/service.go:22
main.main()
/home/travis/gopath/src/github.com/nats-io/nats-server/main.go:118 +0x2fa
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (14 by maintainers)
Commits related to this issue
- General improvements to accounting for the filestore. This in response to tracking issue #3114. Signed-off-by: Derek Collison <derek@nats.io> — committed to nats-io/nats-server by derekcollison 2 years ago
- General improvements to accounting for the filestore. This in response to tracking issue #3114. Signed-off-by: Derek Collison <derek@nats.io> — committed to nats-io/nats-server by derekcollison 2 years ago
After four hours of running the nightly image, the server has not restarted on its own at all. Manual restarts recover fine.
I’ve just had a chance to look through the logs to see what happened.
The server was under-resourced. I wasn’t setting resource.requests and resource.limits with helm, and so it was receiving only token amounts of cpu and memory despite having plenty of available resources on the node. Because the server is so light and efficient, this wasn’t immediately obvious.
The process was suddenly terminating and restarting periodically, with no particular errors being logged. It sucessfully restored itself many times before we began to see the panic. Once that happened, the server consistently panicked on startup.
In case it’s relevant, there are also a lot of these messages being logged:
These are not particularly surprising to me, I know I have some messages that take a long time to process.