nats-server: RAFT: Resource not found on account deletion

Defect

On our production servers, we observed the following kind of error (generated every 20 seconds approximately):

[82010] 2023/03/23 14:59:46.793797 [ERR] RAFT [Yjedx3Ck - S-R3F-12Jrdv1K] Resource not found: open /home/nats/store/jetstream/AAWOKMTECQYMRURLZLDPLKEMEOSXPSD5QEJST3QS5UZSRH5W7JGOKKPD/_js_/S-R3F-12Jrdv1K/tav.idx: no such file or directory
[82010] 2023/03/23 14:59:46.793825 [WRN] RAFT [Yjedx3Ck - S-R3F-12Jrdv1K] Error writing term and vote file for "S-R3F-12Jrdv1K": open /home/nats/store/jetstream/AAWOKMTECQYMRURLZLDPLKEMEOSXPSD5QEJST3QS5UZSRH5W7JGOKKPD/_js_/S-R3F-12Jrdv1K/tav.idx: no such file or directory
[82010] 2023/03/23 14:59:46.793846 [ERR] RAFT [Yjedx3Ck - S-R3F-12Jrdv1K] Resource not found: open /home/nats/store/jetstream/AAWOKMTECQYMRURLZLDPLKEMEOSXPSD5QEJST3QS5UZSRH5W7JGOKKPD/_js_/S-R3F-12Jrdv1K/tav.idx: no such file or directory
[82010] 2023/03/23 14:59:52.521153 [ERR] RAFT [Yjedx3Ck - S-R3F-12Jrdv1K] Resource not found: open /home/nats/store/jetstream/AAWOKMTECQYMRURLZLDPLKEMEOSXPSD5QEJST3QS5UZSRH5W7JGOKKPD/_js_/S-R3F-12Jrdv1K/tav.idx: no such file or directory
[82010] 2023/03/23 14:59:52.521201 [WRN] RAFT [Yjedx3Ck - S-R3F-12Jrdv1K] Error writing term and vote file for "S-R3F-12Jrdv1K": open /home/nats/store/jetstream/AAWOKMTECQYMRURLZLDPLKEMEOSXPSD5QEJST3QS5UZSRH5W7JGOKKPD/_js_/S-R3F-12Jrdv1K/tav.idx: no such file or directory
[82010] 2023/03/23 14:59:58.195319 [WRN] JetStream cluster stream 'ABF5FAZ53HINEIAX2BPKECJXSJ7R7OCJKXQNWIDAFNM5XAV5E5LZDGII > Q_Ginkgo-N1-sqs_test-95-gqurl-sqs-with-a-queue-candac4d4cb22ef-queue' has NO quorum, stalled
[82010] 2023/03/23 14:59:58.196204 [ERR] RAFT [Yjedx3Ck - S-R3F-12Jrdv1K] Resource not found: open /home/nats/store/jetstream/AAWOKMTECQYMRURLZLDPLKEMEOSXPSD5QEJST3QS5UZSRH5W7JGOKKPD/_js_/S-R3F-12Jrdv1K/tav.idx: no such file or directory
[82010] 2023/03/23 14:59:58.196239 [WRN] RAFT [Yjedx3Ck - S-R3F-12Jrdv1K] Error writing term and vote file for "S-R3F-12Jrdv1K": open /home/nats/store/jetstream/AAWOKMTECQYMRURLZLDPLKEMEOSXPSD5QEJST3QS5UZSRH5W7JGOKKPD/_js_/S-R3F-12Jrdv1K/tav.idx: no such file or directory

A server restart generally solves the issue.

Versions of nats-server and affected client libraries used:

  • nats-server: v2.9.15 (but we also seen the issue with v2.9.14)
  • Go libraries:
    • github.com/nats-io/jwt/v2 v2.3.0
    • github.com/nats-io/nats.go v1.23.0

OS/Container environment:

  • NATS cluster (with 3 virtual machines)
  • Ubuntu 20.04.5 LTS

Steps or code to reproduce the issue:

While not easy to reproduce, we managed to get the same problem on a temporary test cluster. Here is the process:

  • several Go clients are run in parallel
  • these Go clients:
    • create a NATS account (through JWT)
    • connect as user to create a stream and a consumer
    • once operations are finished on the stream / consumer, the stream is deleted, the NATS account is DELETED and PURGED

Expected result:

It should work all the time.

Actual result:

It sometimes fails with RAFT errors. Note that the mentioned directory (/home/nats/store/jetstream/AAWOKMTECQYMRURLZLDPLKEMEOSXPSD5QEJST3QS5UZSRH5W7JGOKKPD/_js_/S-R3F-12Jrdv1K in my case) does not exist anymore.

Please find the full logs below and look for errors on the server number 2:

If you need additional information, please ask 😃 Thanks!

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 15 (12 by maintainers)

Most upvoted comments

Servers are built nightly from both the main and dev branches. We encourage folks to try them out as needed.

https://hub.docker.com/r/synadia/nats-server/tags

Look for nightly-main, nightly is the dev branch.

Could you tell if this directory was in fact present and the file was just missing?

home/nats/store/jetstream/AAWOKMTECQYMRURLZLDPLKEMEOSXPSD5QEJST3QS5UZSRH5W7JGOKKPD/_js_/S-R3F-12Jrdv1K

The directory is missing.

Thanks this is helpful and this is an active bug we are tracking and hope to fix before a 2.9.16 release.