nats-streaming-server: Segmentation violation in cluster mode and latest docker container

Hi guys! I’ve just tested latest version of NATS Streaming in cluster mode with 3 nodes and got panic: runtime error: invalid memory address or nil pointer dereference.

My steps:

  1. Create 3 data directories (srv1, srv2, srv3)
  2. Put simple nats.conf in each data directory (ports end with 1, 2 and 3 respectively):
port: 4231
http_port: 8231
store: "file"
dir: "/data"
cluster {
  host: '10.10.0.100'
  port: 5231
  bootstrap: true
  raft_logging: true
  routes = [
    nats-route://10.10.0.100:5232
    nats-route://10.10.0.100:5233
  ]
}
  1. Start 3 docker containers: docker container run -it --name nats1 -v ${PWD}/srv1:/data --network=host nats-streaming -c /data/nats.conf
  2. Do some simple tests to be sure everything is working.
  3. Start 3 consumers (just commented out printing every messages):
stan-sub -s nats://10.10.0.100:4231 -id 1 --qgroup qgtest -all test
stan-sub -s nats://10.10.0.100:4232 -id 2 --qgroup qgtest -all test
stan-sub -s nats://10.10.0.100:4233 -id 3 --qgroup qgtest -all test
  1. Start data producer:
stan-bench -s nats://10.10.0.100:4231 -np 1 -ns 0 -n 10000000 -ms 1000 -a test

After several minutes second and third containers started printing errors:

...
[1] 2018/02/14 20:32:31.100452 [ERR] STREAM: [Client:benchmark-pub-1] Error replicating message for subject "_STAN.pub.test-cluster.test": timed out enqueuing operation
[1] 2018/02/14 20:32:31.100464 [ERR] STREAM: [Client:benchmark-pub-97] Error replicating message for subject "_STAN.pub.test-cluster.test": timed out enqueuing operation
[1] 2018/02/14 20:32:31.100475 [ERR] STREAM: [Client:benchmark-pub-94] Error replicating message for subject "_STAN.pub.test-cluster.test": timed out enqueuing operation
[1] 2018/02/14 20:32:31.100485 [ERR] STREAM: [Client:benchmark-pub-97] Error replicating message for subject "_STAN.pub.test-cluster.test": timed out enqueuing operation
...

and then:

[1] 2018/02/14 20:32:27.598057 [ERR] STREAM: [Client:benchmark-pub-85] Error replicating message for subject "_STAN.pub.test-cluster.test": timed out enqueuing operation
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x87e359]

goroutine 70 [running]:
github.com/nats-io/nats-streaming-server/stores.(*FileMsgStore).removeFirstMsg(0xc42009a340, 0x0, 0xc42ec9af00)
	/go/src/github.com/nats-io/nats-streaming-server/stores/filestore.go:2647 +0x69
github.com/nats-io/nats-streaming-server/stores.(*FileMsgStore).enforceLimits(0xc42009a340, 0x1e0001, 0xc4531b12c0, 0xc447f7b801)
	/go/src/github.com/nats-io/nats-streaming-server/stores/filestore.go:2589 +0xa2
github.com/nats-io/nats-streaming-server/stores.(*FileMsgStore).Store(0xc42009a340, 0xc4531b12c0, 0x0, 0x0, 0x0)
	/go/src/github.com/nats-io/nats-streaming-server/stores/filestore.go:2482 +0xb99
github.com/nats-io/nats-streaming-server/server.(*StanServer).Apply(0xc4201ea000, 0xc450f849a0, 0xdfe320, 0x3)
	/go/src/github.com/nats-io/nats-streaming-server/server/server.go:4498 +0x176
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc44b13b1e0)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/fsm.go:57 +0x17b
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).runFSM(0xc4200e1340)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/fsm.go:120 +0x31e
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).(github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.runFSM)-fm()
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/api.go:506 +0x2a
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc4200e1340, 0xc420180500)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/state.go:146 +0x53
created by github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*raftState).goFunc
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/state.go:144 +0x66

After restarting failed containers:

$ docker container start -ai nats2
[1] 2018/02/14 20:34:30.879042 [INF] STREAM: Starting nats-streaming-server[test-cluster] version 0.8.0-beta
[1] 2018/02/14 20:34:30.879082 [INF] STREAM: ServerID: PIWf5lutLWcVp9fYVr2WFY
[1] 2018/02/14 20:34:30.879087 [INF] STREAM: Go version: go1.9.3
[1] 2018/02/14 20:34:30.879323 [INF] Starting nats-server version 1.0.4
[1] 2018/02/14 20:34:30.879554 [INF] Starting http monitor on 0.0.0.0:8232
[1] 2018/02/14 20:34:30.879671 [INF] Listening for client connections on 0.0.0.0:4232
[1] 2018/02/14 20:34:30.879685 [INF] Server is ready
[1] 2018/02/14 20:34:30.880552 [INF] Listening for route connections on 10.10.0.100:5232
[1] 2018/02/14 20:34:30.906597 [INF] STREAM: Recovering the state...
[1] 2018/02/14 20:34:31.373113 [INF] STREAM: Recovered 1 channel(s)
[1] 2018/02/14 20:34:33.298825 [INF] STREAM: raft: Initial configuration (index=1): [{Suffrage:Voter ID:ib2LAFTTccvyusMmcYgM9D Address:test-cluster.ib2LAFTTccvyusMmcYgM9D.stan}]
[1] 2018/02/14 20:34:33.298939 [INF] STREAM: raft: Node at test-cluster.ib2LAFTTccvyusMmcYgM9D.stan [Follower] entering Follower state (Leader: "")
[1] 2018/02/14 20:34:33.300337 [INF] STREAM: Message store is RAFT_FILE
[1] 2018/02/14 20:34:33.300415 [INF] STREAM: ---------- Store Limits ----------
[1] 2018/02/14 20:34:33.300425 [INF] STREAM: Channels:                  100 *
[1] 2018/02/14 20:34:33.300431 [INF] STREAM: --------- Channels Limits --------
[1] 2018/02/14 20:34:33.300436 [INF] STREAM:   Subscriptions:          1000 *
[1] 2018/02/14 20:34:33.300443 [INF] STREAM:   Messages     :       1000000 *
[1] 2018/02/14 20:34:33.300449 [INF] STREAM:   Bytes        :     976.56 MB *
[1] 2018/02/14 20:34:33.300456 [INF] STREAM:   Age          :     unlimited *
[1] 2018/02/14 20:34:33.300464 [INF] STREAM: ----------------------------------
[1] 2018/02/14 20:34:35.169644 [INF] STREAM: raft: Heartbeat timeout from "" reached, starting election
[1] 2018/02/14 20:34:35.169694 [INF] STREAM: raft: Node at test-cluster.ib2LAFTTccvyusMmcYgM9D.stan [Candidate] entering Candidate state in term 3
[1] 2018/02/14 20:34:35.169961 [INF] STREAM: raft: Election won. Tally: 1
[1] 2018/02/14 20:34:35.169977 [INF] STREAM: raft: Node at test-cluster.ib2LAFTTccvyusMmcYgM9D.stan [Leader] entering Leader state
[1] 2018/02/14 20:34:35.170079 [INF] STREAM: server became leader, performing leader promotion actions
[1] 2018/02/14 20:34:38.276052 [INF] STREAM: WARNING: Reached limits for store "test" (msgs=963311/1000000 bytes=976.56 MB/976.56 MB), dropping old messages to make room for new ones
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x87e359]

goroutine 92 [running]:
github.com/nats-io/nats-streaming-server/stores.(*FileMsgStore).removeFirstMsg(0xc42023a000, 0x0, 0xc4203e1600)
	/go/src/github.com/nats-io/nats-streaming-server/stores/filestore.go:2647 +0x69
github.com/nats-io/nats-streaming-server/stores.(*FileMsgStore).enforceLimits(0xc42023a000, 0x1e0001, 0xc42172aa50, 0xc422016a01)
	/go/src/github.com/nats-io/nats-streaming-server/stores/filestore.go:2589 +0xa2
github.com/nats-io/nats-streaming-server/stores.(*FileMsgStore).Store(0xc42023a000, 0xc42172aa50, 0x0, 0x0, 0x0)
	/go/src/github.com/nats-io/nats-streaming-server/stores/filestore.go:2482 +0xb99
github.com/nats-io/nats-streaming-server/server.(*StanServer).Apply(0xc4201ee000, 0xc4203354a0, 0xdfe320, 0xbe99453790618e89)
	/go/src/github.com/nats-io/nats-streaming-server/server/server.go:4498 +0x176
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc4202ec2d0)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/fsm.go:57 +0x17b
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).runFSM(0xc4200d0000)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/fsm.go:120 +0x31e
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).(github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.runFSM)-fm()
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/api.go:506 +0x2a
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc4200d0000, 0xc4202ec150)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/state.go:146 +0x53
created by github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*raftState).goFunc
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/state.go:144 +0x66

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 21 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@sega-yarkin Just an update that I am still working on this issue. I was not able to reproduce the issue, but was able to find another possible panic, which I have a PR for that.

I could add defensive code to prevent the panic that you have reported, I think, but I would prefer to be able to understand how this can happen in the first place. I believed that the fix in #477 should have solved the problem. I will continue the investigation.

Thanks for your patience!

@sega-yarkin Just an update that I have made some progress and may have a PR sometime tomorrow.