moby: Unable to start dockerd on swarm manager. Error says "tocommit(150010) is out of range [lastIndex(78775)]. Was the raft log corrupted"

Description I am unable to start docker on one of the manager nodes of my swarm (IMPORTANT: docker swarm mode, not docker swarm).

Output of docker version:

Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 22:01:48 2016
 OS/Arch:      linux/amd64
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

Output of docker info: I’m unable to run this command because docker won’t start

Additional environment details (AWS, VirtualBox, physical, etc.): 7 Node swarm with 3 manager nodes. Running in a dedicated VPS on AWS. All nodes are running Ubuntu 16.04.1 LTS

Output of dockerd:

INFO[0000] libcontainerd: new containerd process, pid: 1693 
WARN[0000] containerd: low RLIMIT_NOFILE changing to max  current=1024 max=65536
INFO[0001] [graphdriver] using prior storage driver "aufs" 
INFO[0001] Graph migration to content-addressability took 0.00 seconds 
WARN[0001] Your kernel does not support swap memory limit. 
INFO[0001] Loading containers: start.                   
.................................................................................INFO[0001] Firewalld running: false                     
INFO[0001] Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address 

INFO[0001] Loading containers: done.                    
INFO[0001] Listening for local connections               addr=/var/lib/docker/swarm/control.sock proto=unix
INFO[0001] Listening for connections                     addr=[::]:2377 proto=tcp
WARN[0001] ignoring request to join cluster, because raft state already exists 
INFO[0001] 52b75a6e6dd83823 became follower at term 2   
INFO[0001] newRaft 52b75a6e6dd83823 [peers: [8475b40f5c3b344,197af42d1ac22e90,52b75a6e6dd83823], term: 2, commit: 78774, applied: 70000, lastindex: 78775, lastterm: 2] 
INFO[0002] 52b75a6e6dd83823 [term: 2] received a MsgHeartbeat message with higher term from 197af42d1ac22e90 [term: 14] 
INFO[0002] 52b75a6e6dd83823 became follower at term 14  
PANI[0002] tocommit(150010) is out of range [lastIndex(78775)]. Was the raft log corrupted, truncated, or lost? 
panic: (*logrus.Entry) (0x1d275e0,0xc824fda400)

goroutine 520 [running]:
panic(0x1d275e0, 0xc824fda400)
	/usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/Sirupsen/logrus.Entry.log(0xc82004c1c0, 0xc8203c68d0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc8246d9a40, ...)
	/usr/src/docker/vendor/src/github.com/Sirupsen/logrus/entry.go:113 +0x62c
github.com/Sirupsen/logrus.(*Entry).Panic(0xc82004da00, 0xc82527c3b8, 0x1, 0x1)
	/usr/src/docker/vendor/src/github.com/Sirupsen/logrus/entry.go:158 +0x99
github.com/Sirupsen/logrus.(*Entry).Panicf(0xc82004da00, 0x20d2dc0, 0x5d, 0xc825267560, 0x2, 0x2)
	/usr/src/docker/vendor/src/github.com/Sirupsen/logrus/entry.go:206 +0x139
github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc82110d260, 0x249fa)
	/usr/src/docker/vendor/src/github.com/coreos/etcd/raft/log.go:194 +0x1a6
github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc820d3d2c0, 0x8, 0x52b75a6e6dd83823, 0x197af42d1ac22e90, 0xe, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/usr/src/docker/vendor/src/github.com/coreos/etcd/raft/raft.go:771 +0x44
github.com/coreos/etcd/raft.stepFollower(0xc820d3d2c0, 0x8, 0x52b75a6e6dd83823, 0x197af42d1ac22e90, 0xe, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/usr/src/docker/vendor/src/github.com/coreos/etcd/raft/raft.go:736 +0x119c
github.com/coreos/etcd/raft.(*raft).Step(0xc820d3d2c0, 0x8, 0x52b75a6e6dd83823, 0x197af42d1ac22e90, 0xe, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/usr/src/docker/vendor/src/github.com/coreos/etcd/raft/raft.go:564 +0x3e0
github.com/coreos/etcd/raft.(*node).run(0xc822c5bbd0, 0xc820d3d2c0)
	/usr/src/docker/vendor/src/github.com/coreos/etcd/raft/node.go:310 +0x90e
created by github.com/coreos/etcd/raft.RestartNode
	/usr/src/docker/vendor/src/github.com/coreos/etcd/raft/node.go:215 +0x2e4

About this issue

Original URL
State: open
Created 8 years ago
Comments: 22 (10 by maintainers)

Most upvoted comments

Do you have other managers? If so, I’d follow this process to readd the node:

Demote the affected node to a worker
Move /var/lib/docker/swarm out of the way on the affected node
Join again using docker swarm join
Delete old the node entry (docker node rm) afterwards, to clean up the node list.

aaronlehmann on Dec 6, 2016

@AashishAsh Go to https://github.com/moby/moby/issues/new to file a new issue (the “New Issue” button up in the top right-hand corner of the page). There’s a template to follow when you do so.

Probably set the title of the issue to be something like “dockerd nil pointer dereference panic in agent.WalkTask”?

The template will ask you for how you produced this exception (e.g. do you know how to trigger it?), what happened (good place to include the above log), the docker version information (although it looks like the daemon was down when that happened - it’d be useful to get version information about which version of the daemon this is against as well), and the output of docker info as well as any other system information you may have.

cyli on Jul 6, 2017