moby: docker suddenly stops working in swarm mode due to high memory consumption
Description
I have 3 managers in my swarm: dmgr-01
, dmgr-02
and dmgr-03
. I always execute all commands on dmgr-01
and it was the leader in swarm. But today I’ve got following output for docker service ls
command:
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f7bc6596067 m=0
goroutine 0 [idle]:
goroutine 1 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:252 fp=0xc420020768 sp=0xc420020760
runtime.main()
/usr/local/go/src/runtime/proc.go:127 +0x6c fp=0xc4200207c0 sp=0xc420020768
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4200207c8 sp=0xc4200207c0
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1
rax 0x0
rbx 0x7f7bc6907708
rcx 0xffffffffffffffff
rdx 0x6
rdi 0x3daf
rsi 0x3daf
rbp 0xc5a91e
rsp 0x7ffe234fb208
r8 0xa
r9 0x7f7bc6f47740
r10 0x8
r11 0x206
r12 0x1bcd050
r13 0xf3
r14 0x30
r15 0x3
rip 0x7f7bc6596067
rflags 0x206
cs 0x33
fs 0x0
gs 0x0
And now in docker node ls
I see:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
50qxe43jjlwltuwzgem8cwwsa * dmgr-02 Ready Active Leader
hzykzxzh9jmn5mimj02awr42i dmgr-01 Ready Active Reachable
wiczce0ey8wzr9zaivlyfgm0f dmgr-03 Ready Active Reachable
So dmgr-01
stops working and drops leadership in swarm. Also my monitoring tells me, that opened ports flapped on that node.
Additional information:
I looked through my logs on problem node - and found out of memory
errors. But you should note, that all my manager nodes have 2 GB of RAM and containers running here consume about 200-300 MB - there are only nginx
and consul
processes. In normal state I have free -m
:
total used free shared buffers cached
Mem: 2010 350 1659 0 19 173
-/+ buffers/cache: 157 1852
Swap: 0 0 0
My cluster has 10 nodes and about 70 docker services. So, I think any out of memory
errors can be caused only by memory leak in docker engine.
Output of docker version
:
Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.5
Git commit: 60ccb22
Built: Thu Feb 23 10:53:29 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 60ccb22
Built: Thu Feb 23 10:53:29 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 8
Running: 1
Paused: 0
Stopped: 7
Images: 10
Server Version: 17.03.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 54
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: hzykzxzh9jmn5mimj02awr42i
Is Manager: true
ClusterID: hqvohft3etj4ajnkgubbnjwzp
Managers: 3
Nodes: 15
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: <IP of dmgr-01 here>
Manager Addresses:
<IP of dmgr-01 here>:2377
<IP of dmgr-02 here>:2377
<IP of dmgr-03 here>:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 977c511eda0925a723debdc94d09459af49d082a
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.963 GiB
Name: dmgr-01
ID: VUNH:H6FP:CO4N:O6VB:CFCG:4T32:JQQV:SIS3:UGT7:V2FK:46VS:PRKZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: filiatixbot
Registry: https://index.docker.io/v1/
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
I use virtual server on DigitalOcean.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 23 (12 by maintainers)
Ok, so it looks like GELF was unable to send logs, possibly leading to it buffering messages until it’s able to send them.
ping @mariussturm @cpuguy83 any ideas / suggestions?