nats-server: NATS Server corrupts the meta.inf file when it gets killed/closed/exited during startup stream loading time.
Defect
Make sure that these boxes are checked before submitting your issue – thank you!
- Included
nats-server -DV
output - Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve) Logs are included. nats-server.exe -DV [2684] 2023/05/26 01:08:32.952160 [[32mINF[0m] Starting nats-server [2684] 2023/05/26 01:08:32.953198 [[32mINF[0m] Version: 2.9.16 [2684] 2023/05/26 01:08:32.953720 [[32mINF[0m] Git: [f84ca24] [2684] 2023/05/26 01:08:32.954231 [[36mDBG[0m] Go build: go1.19.8 [2684] 2023/05/26 01:08:32.954257 [[32mINF[0m] Name: NAHUTXJA2HWFEV45HQ534LEZ54ZKESIX5NZ6VJKMYKY5XZ5H767XTS2G [2684] 2023/05/26 01:08:32.954257 [[32mINF[0m] ID: NAHUTXJA2HWFEV45HQ534LEZ54ZKESIX5NZ6VJKMYKY5XZ5H767XTS2G [2684] 2023/05/26 01:08:32.954257 [[36mDBG[0m] Created system account: “$SYS” [2684] 2023/05/26 01:08:32.955842 [[32mINF[0m] Listening for client connections on 0.0.0.0:4222 [2684] 2023/05/26 01:08:32.955842 [[36mDBG[0m] Get non local IPs for “0.0.0.0” [2684] 2023/05/26 01:08:32.960573 [[36mDBG[0m] ip=1.7.19.7 [2684] 2023/05/26 01:08:32.962145 [[32mINF[0m] Server is ready [2684] 2023/05/26 01:08:32.963720 [[36mDBG[0m] maxprocs: Leaving GOMAXPROCS=4: CPU quota u
Versions of nats-server
and affected client libraries used:
Version: 2.9.16
OS/Container environment:
Windows Server 2019 Version 1809 (OS Build 17763.4010)
Steps or code to reproduce the issue:
-
Start the Nats-server, Enable Jetstream, Create a stream, Add many messages (possibly of bug size) this is to increase the time when nats-server starts up as per step No.5
-
Now Stop the nats-server process.
-
Have a script to kill/stop the nats-server service upon demand.
-
Start the nats-server service. (having the logs enabled)
-
When the server starts to "Starting restore for stream ‘$G > streamname’ ", Kill the nats-server.exe.
-
Start the nats-server process newly again.
-
We see the error " Error unmarshalling stream metafile “C:\\DataStore\EventData\jetstream\$G\streams\STREAM_EI\meta.inf”: invalid character ‘h’ looking for beginning of value". This invalid character can change every time.
-
The stream could not be recovered at all.
-
This kill can also occur other times before the server is saying it is ready.
-
Also the same issue happens when System Time changes when server is doing step No. 5
Expected result:
When we start the service, the stream should be recoverable.
Actual result:
When we start the service and follow the steps, the stream meta file is updated / corrupted. leaving the stream un usable.
Actual Logs
LogToSahre.txt Some path and stream names are redacted, please do not mind those.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (6 by maintainers)
Commits related to this issue
- reproduce issue #4195 — committed to nats-io/nats-server by tbeets a year ago
- reproduce issue #4195 on Linux — committed to nats-io/nats-server by tbeets a year ago
- [FIXED] Killed server on restart could render encrypted stream unrecoverable (#4210) When a server was killed on restart before an encrypted stream was recovered the keyfile was removed and could ca... — committed to nats-io/nats-server by derekcollison a year ago
Without that file we can not decrypt the meta file. But again, there should be no reason that the server would try to remove that file during startup that I know, but plan to try to take a look more closely this week.
If you turn it off as an experiment does the problem go away?
Not suggesting you run in production that way just curious.