moby: Kernel panic running swarm mode on Raspberry Pi

I have a cluster of 3 Raspberry Pi 3 boards configured as docker swarm-mode nodes with a single master/node running on an x86 machine.

The Pis are running Hypriot 7 and the master is running Centos 7. All are running Docker 1.12.3.

I’m seeing what looks like a kernel panic error message which shows variously (and intermittently) in the 3 Pi nodes messages log file and a corresponding error message in the docker master journalctl log. This only ever seems to effect the nodes that swarm services are being run on which leads me to wonder if it is docker itself contributing to it.

Journalctl (on master):

Oct 27 19:42:02 brix dockerd[855]: time="2016-10-27T19:42:02.466494032+01:00" level=warning msg="2016/10/27 19:42:02 [ERR] memberlist: Failed TCP fallback ping: read tcp 192.168.10.40:39384->192.168.10.31:7946: i/o timeout\n"
Oct 27 19:42:04 brix dockerd[855]: time="2016-10-27T19:42:04.288344291+01:00" level=warning msg="2016/10/27 19:42:04 [WARN] memberlist: Refuting a dead message (from: Pi1-072e437130e0)\n"
Oct 27 19:44:46 brix dockerd[855]: time="2016-10-27T19:44:46.465943311+01:00" level=warning msg="2016/10/27 19:44:46 [ERR] memberlist: Failed TCP fallback ping: read tcp 192.168.10.40:39910->192.168.10.31:7946: i/o timeout\n"
Oct 27 19:44:46 brix dockerd[855]: time="2016-10-27T19:44:46.466152915+01:00" level=info msg="2016/10/27 19:44:46 [INFO] memberlist: Suspect Pi1-072e437130e0 has failed, no acks received\n"
Oct 27 19:44:48 brix dockerd[855]: time="2016-10-27T19:44:48.468592742+01:00" level=warning msg="2016/10/27 19:44:48 [WARN] memberlist: Refuting a suspect message (from: Pi1-072e437130e0)\n"
Oct 27 19:44:50 brix dockerd[855]: time="2016-10-27T19:44:50.465842724+01:00" level=info msg="2016/10/27 19:44:50 [INFO] memberlist: Suspect Pi1-072e437130e0 has failed, no acks received\n

Messages (on node):

Oct 27 19:44:59 pi1 kernel: [677269.925368] caller is debug_smp_processor_id+0x18/0x24
Oct 27 19:44:59 pi1 kernel: [677269.957148] CPU: 3 PID: 21801 Comm: [main]>worker1 Not tainted 4.4.24-hypriotos-v7+ #1
Oct 27 19:45:01 pi1 kernel: [677270.017648] Hardware name: BCM2709
Oct 27 19:45:01 pi1 kernel: [677270.046451] [<800193e4>] (unwind_backtrace) from [<800149e0>] (show_stack+0x20/0x24)
Oct 27 19:45:01 pi1 kernel: [677270.104243] [<800149e0>] (show_stack) from [<8033575c>] (dump_stack+0xbc/0x108)
Oct 27 19:45:01 pi1 kernel: [677270.161493] [<8033575c>] (dump_stack) from [<80350024>] (check_preemption_disabled+0x104/0x134)
Oct 27 19:45:01 pi1 kernel: [677270.220277] [<80350024>] (check_preemption_disabled) from [<8035006c>] (debug_smp_processor_id+0x18/0x24)
Oct 27 19:45:01 pi1 kernel: [677270.280108] [<8035006c>] (debug_smp_processor_id) from [<7f45bad0>] (ip_vs_in.part.2.constprop.9+0x23c/0x75c [ip_vs])
Oct 27 19:45:01 pi1 kernel: [677270.342067] [<7f45bad0>] (ip_vs_in.part.2.constprop.9 [ip_vs]) from [<7f45c074>] (ip_vs_local_request4+0x40/0x44 [ip_vs])
Oct 27 19:45:01 pi1 kernel: [677270.407250] [<7f45c074>] (ip_vs_local_request4 [ip_vs]) from [<8050c474>] (nf_iterate+0x80/0x90)
Oct 27 19:45:01 pi1 kernel: [677270.475595] [<8050c474>] (nf_iterate) from [<8050c504>] (nf_hook_slow+0x80/0xec)
Oct 27 19:45:01 pi1 kernel: [677270.548106] [<8050c504>] (nf_hook_slow) from [<80518d2c>] (__ip_local_out+0xac/0xb8)
Oct 27 19:45:01 pi1 kernel: [677270.626631] [<80518d2c>] (__ip_local_out) from [<80518d5c>] (ip_local_out+0x24/0x4c)
Oct 27 19:45:01 pi1 kernel: [677270.711065] [<80518d5c>] (ip_local_out) from [<80519084>] (ip_queue_xmit+0x144/0x3c0)
Oct 27 19:45:01 pi1 kernel: [677270.800822] [<80519084>] (ip_queue_xmit) from [<80531424>] (tcp_transmit_skb+0x4d0/0x918)
Oct 27 19:45:01 pi1 kernel: [677270.895312] [<80531424>] (tcp_transmit_skb) from [<805319dc>] (tcp_write_xmit+0x170/0xe50)
Oct 27 19:45:01 pi1 kernel: [677270.990552] [<805319dc>] (tcp_write_xmit) from [<805329d4>] (__tcp_push_pending_frames+0x44/0xb0)
Oct 27 19:45:01 pi1 kernel: [677271.087264] [<805329d4>] (__tcp_push_pending_frames) from [<80521288>] (tcp_push+0x130/0x158)
Oct 27 19:45:01 pi1 kernel: [677271.185353] [<80521288>] (tcp_push) from [<805248b0>] (tcp_sendmsg+0xd0/0xa60)
Oct 27 19:45:01 pi1 kernel: [677271.282465] [<805248b0>] (tcp_sendmsg) from [<8054e330>] (inet_sendmsg+0xa8/0xd0)
Oct 27 19:45:01 pi1 kernel: [677271.379796] [<8054e330>] (inet_sendmsg) from [<804bb8b8>] (sock_sendmsg+0x24/0x34)
Oct 27 19:45:01 pi1 kernel: [677271.477279] [<804bb8b8>] (sock_sendmsg) from [<804bcb04>] (SyS_sendto+0xc4/0xec)
Oct 27 19:45:01 pi1 kernel: [677271.574380] [<804bcb04>] (SyS_sendto) from [<804bcb54>] (SyS_send+0x28/0x30)
Oct 27 19:45:01 pi1 kernel: [677271.626723] [<804bcb54>] (SyS_send) from [<8000fc20>] (ret_fast_syscall+0x0/0x1c)

Docker info (on master):

Plugins:
 Volume: local nfs
 Network: overlay host bridge null
Swarm: active
 NodeID: 992jl6kbh0kqw7magsl9qmjmp
 Is Manager: true
 ClusterID: 5e6mlu2mgefgwoumqj5zktd3f
 Managers: 1
 Nodes: 4
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 192.168.10.40
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.36.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.694 GiB
Name: brix

Docker info (on node):

Containers: 3
 Running: 2
 Paused: 0
 Stopped: 1
Images: 34
Server Version: 1.12.3
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local nfs
 Network: overlay host null bridge
Swarm: active
 NodeID: euocmfbdr4135qbu6mpfcgodk
 Is Manager: false
 Node Address: 192.168.10.31
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.24-hypriotos-v7+
Operating System: Raspbian GNU/Linux 8 (jessie)
OSType: linux
Architecture: armv7l
CPUs: 4
Total Memory: 862 MiB
Name: Pi1
ID: 5HOX:P24E:4WZL:QLWR:WJ6U:CZWT:A65O:D3SP:TLIJ:PCIE:TFKM:SDKY
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: celtware
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

Edit: I can actually reproduce this now consistently with a docker service when the container is coming up on any node:

The stacktrace can be slightly different, but always seems to involve ip_vs and ends up with:

Oct 27 22:43:46 pi1 kernel: [ 1457.095593] [<800193e4>] (unwind_backtrace) from [<800149e0>] (show_stack+0x20/0x24)
Oct 27 22:43:46 pi1 kernel: [ 1457.192166] [<800149e0>] (show_stack) from [<8033575c>] (dump_stack+0xbc/0x108)
Oct 27 22:43:46 pi1 kernel: [ 1457.285830] [<8033575c>] (dump_stack) from [<80350024>] (check_preemption_disabled+0x104/0x134)
Oct 27 22:43:46 pi1 kernel: [ 1457.380839] [<80350024>] (check_preemption_disabled) from [<8035006c>] (debug_smp_processor_id+0x18/0x24)
Oct 27 22:43:46 pi1 kernel: [ 1457.476628] [<8035006c>] (debug_smp_processor_id) from [<7f45bb90>] (ip_vs_in.part.2.constprop.9+0x2fc/0x75c [ip_vs])

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 41 (41 by maintainers)

Most upvoted comments

The current Hypriot 1.4.0 works for me, too, hooray!