longhorn: [BUG] Instance Manager is running out of pthread resources

Describe the bug

Instance manager fails to allocate new replica reporting runtime/cgo: pthread_create failed: Resource temporarily unavailable.

To Reproduce

After a while of normal operating state. Creating a PVC in longhorn related storage class. The volume is successfully created and attached but remains in Degraded state with on replicas that won’t initialize.

Expected behavior

The volume is in Healthy state.

Log

This is the error from one of instance-manager-r pod (the one on the node that did not initialize the replica) that is also reported in longhorn-manager :

[longhorn-instance-manager] time="2021-10-22T06:14:42Z" level=info msg="wait for gRPC service of process pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2 to start at localhost:10570"
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] runtime/cgo: pthread_create failed: Resource temporarily unavailable
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] SIGABRT: abort
PC=0xbfdd0b m=0 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0xbfdd0b
stack: frame={sp:0x7ffe37bcd990, fp:0x0} stack=[0x7ffe373cef18,0x7ffe37bcdf50)
00007ffe37bcd890:  0000000000210808  0000002200000003
00007ffe37bcd8a0: [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]  00000000ffffffff  00007fe7d7872000
00007ffe37bcd8b0:  00007ffe37bcd8f8  00007ffe37bcd8e0
00007ffe37bcd8c0:  00007ffe37bcd8f0  0000000000bb775c
00007ffe37bcd8d0:  0000000000000000  00000000004680ce <runtime.callCgoMmap+62>
00007ffe37bcd8e0:  00007ffe37bcd8e0  0000000000000000
00007ffe37bcd8f0:  00007ffe37bcd930  000000000045fe38 [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] <runtime.mmap.func1+88>
00007ffe37bcd900:  00007fe7e9c03000  0000000000001000
00007ffe37bcd910[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :  0000003200000003  00000000ffffffff
00007ffe37bcd920:  00007fe7e9c03000  00007ffe37bcd970
00007ffe37bcd930:  00007ffe37bcd9a8  0000000000404d3e[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]  <runtime.mmap+158>
00007ffe37bcd940:  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 00007ffe37bcd978  00007ffe37bcd978
00007ffe37bcd950:  00007ffe37bcd988  0000000000bb775c[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]
00007ffe37bcd960:  00007fe7d7872000  00007ffe37bcd998 [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]
00007ffe37bcd970:  00007ffe37bcd9a8  0000000000bb775c
00007ffe37bcd980[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :  00007fe7e9c03000  00000000004680ce <[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] runtime.callCgoMmap+62>
00007ffe37bcd990: <0000000000000000[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]   0000000000000000
00007ffe37bcd9a0:  0000000000100000  00007ffe37bcd9d0
00007ffe37bcd9b0:  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 00007ffe37bcd9e0  00007ffe37bcda80
00007ffe37bcd9c0:  000000000042a48c <runtime.(*pageAlloc).update+604[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] >  00007ffe37bcda90
00007ffe37bcd9d0:  000000000042a48c <[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] runtime.(*pageAlloc).update+604>  00007fe7fe303c00
00007ffe37bcd9e0: [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]  0000000000000008  000000000000fe80
00007ffe37bcd9f0[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :  0000000000000012  000000003b600000
00007ffe37bcda00[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :  000000003c000000  000780003c000000
00007ffe37bcda10[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :  fffffffe7fffffff  ffffffffffffffff
00007ffe37bcda20[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :  ffffffffffffffff  ffffffffffffffff
00007ffe37bcda30:  ffffffffffffffff  ffffffffffffffff [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]
00007ffe37bcda40:  ffffffffffffffff  ffffffffffffffff
00007ffe37bcda50:  ffffffffffffffff [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]  ffffffffffffffff
00007ffe37bcda60:  ffffffffffffffff  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] ffffffffffffffff
00007ffe37bcda70:  ffffffffffffffff  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] ffffffffffffffff
00007ffe37bcda80:  ffffffffffffffff  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] ffffffffffffffff
runtime: unknown pc 0xbfdd0b
stack: frame={sp:[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 0x7ffe37bcd990, fp:0x0} stack=[0x7ffe373cef18,0x7ffe37bcdf50)
00007ffe37bcd890[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :  0000000000210808  0000002200000003
00007ffe37bcd8a0: [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]  00000000ffffffff  00007fe7d7872000
00007ffe37bcd8b0:  00007ffe37bcd8f8  00007ffe37bcd8e0
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 00007ffe37bcd8c0:  00007ffe37bcd8f0  0000000000bb775c
00007ffe37bcd8d0[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :  0000000000000000  00000000004680ce <[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] runtime.callCgoMmap+62>
00007ffe37bcd8e0:  00007ffe37bcd8e0  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 0000000000000000
00007ffe37bcd8f0:  00007ffe37bcd930  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 000000000045fe38 <runtime.mmap.func1+88>
00007ffe37bcd900: [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]  00007fe7e9c03000  0000000000001000
00007ffe37bcd910[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :  0000003200000003  00000000ffffffff
00007ffe37bcd920[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :  00007fe7e9c03000  00007ffe37bcd970
00007ffe37bcd930:  00007ffe37bcd9a8  0000000000404d3e [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] <runtime.mmap+158>
00007ffe37bcd940:  00007ffe37bcd978  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 00007ffe37bcd978
00007ffe37bcd950:  00007ffe37bcd988  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 0000000000bb775c
00007ffe37bcd960:  00007fe7d7872000  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 00007ffe37bcd998
00007ffe37bcd970:  00007ffe37bcd9a8 [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]  0000000000bb775c
00007ffe37bcd980:  00007fe7e9c03000 [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]  00000000004680ce <runtime.callCgoMmap+62>
00007ffe37bcd990[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] : <0000000000000000  0000000000000000
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 00007ffe37bcd9a0:  0000000000100000  00007ffe37bcd9d0
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 00007ffe37bcd9b0:  00007ffe37bcd9e0  00007ffe37bcda80
00007ffe37bcd9c0:  000000000042a48c <runtime.(*pageAlloc).update+604>  00007ffe37bcda90 [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]
00007ffe37bcd9d0:  000000000042a48c <runtime.(*pageAlloc).update+604> [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]  00007fe7fe303c00
00007ffe37bcd9e0:  0000000000000008  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 000000000000fe80
00007ffe37bcd9f0:  0000000000000012  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 000000003b600000
00007ffe37bcda00:  000000003c000000  000780003c000000
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 00007ffe37bcda10:  fffffffe7fffffff  ffffffffffffffff
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 00007ffe37bcda20:  ffffffffffffffff  ffffffffffffffff
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 00007ffe37bcda30:  ffffffffffffffff  ffffffffffffffff
00007ffe37bcda40:  ffffffffffffffff  ffffffffffffffff[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]
00007ffe37bcda50:  ffffffffffffffff  ffffffffffffffff
00007ffe37bcda60:  [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] ffffffffffffffff  ffffffffffffffff
00007ffe37bcda70:  ffffffffffffffff[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]   ffffffffffffffff
00007ffe37bcda80:  ffffffffffffffff  ffffffffffffffff

goroutine 1 [running]:
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] runtime.systemstack_switch()
	/usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc00004e788 sp=0xc00004e780 pc=0x463f30
runtime.main()
	/usr/local/go/src/runtime/proc.go[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] :133 +0x70 fp=0xc00004e7e0 sp=0xc00004e788 pc=0x437910
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 0xc00004e7e8 sp=0xc00004e7e0 pc=0x466041

rax    0x0
rbx    [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 0x242a880
rcx    0xbfdd0b
rdx    0x0
rdi    [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 0x2
rsi    0x7ffe37bcd990
rbp    0x10c3fdb
rsp    [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 0x7ffe37bcd990
r8     0x0
r9     0x7ffe37bcd990
r10    [pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] 0x8
r11    0x246
r12    0x242bbf0
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] r13    0x0
r14    0x105f81c
r15    0x0
[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2] rip    0xbfdd0b
rflags 0x246
cs     0x33[pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2]
fs     0x0
gs     0x0
[longhorn-instance-manager] time="2021-10-22T06:14:42Z" level=info msg="Process Manager: process pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2 error out, error msg: exit status 2"
[longhorn-instance-manager] time="2021-10-22T06:14:42Z" level=debug msg="Process update: pvc-f04220b3-0f93-458e-a17a-f4137f3c5d62-r-f797a9f2: state error: Error: exit status 2"

Environment:

  • Longhorn version: v1.2.2
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: Vanilla
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 5 (but only 3 for Longhorn)
  • Node config
    • OS type and version: Arch Linux with Kernel 5.14.11 / GLibC 2.33
    • Container Runtime : cri-o 1.22 / runc 1.0.2
    • CPU per node: 4
    • Memory per node: 32 GB
    • Disk type(e.g. SSD/NVMe): Mixed NVMe and HDD with two storageclasses using disklabels.
    • Network bandwidth between the nodes: 1000BASTX
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
  • Number of Longhorn volumes in the cluster: 34

Additional context

With a small sized cluster (3 bare-metal hosts with 4-core CPU / 32 Go RAM / 1TB storage) hosting 34 volumes. All systems OK for 7 days and then creating a new volume : the volume enters a Degraded state and one of the replicas could not be scheduled. Looking at the instance-manager-r pod logs show the error above. When restarting the affected pod : all volumes went in Degraded state and full resync was triggered. But, at the end of the process, another (not the same) instance-manager-r reports the same error. After waiting the sync to go as far as it cans and restarting all instance-manager-r pods, the cluster went back in stable state.

Seems to be a goroutines leak race condition ?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 16 (6 by maintainers)

Most upvoted comments

I just hit the problem when scaling up my cluster to 5 nodes 23 volumes 3 replicas, using crio too.

$ cat /etc/crio/crio.conf
...
[crio.runtime]
pids_limit = 2048
...

$ systemctl restart crio

and restarting controllers fixed the problem on my side

So, more than 20 days without any new incident. It seems to be stable now with Longhorn v1.2.3 and correct pids_limit.

Don’t know which is the correct value but it seems to be working for @n0rad with 2048.

Closing now. Thanks all for your help.

Same here (with still 2048) after restarting all nodes to upgrade kube with now 14 to 16 days uptime. It was maybe because restarting controllers only was just not enough.

No incident for 17 days now using :

[crio.runtime]
pids_limit = 4096

I will now upgrade to 1.2.3 and so reset this counter 😛

I think I shout victory too fast. After 24hours, the problem comes again on multiple nodes 😞

Here the results on affected node epimethee today :

root@instance-manager-r-8ce321d6:/# ss -na | wc -l
159

root@instance-manager-r-8ce321d6:/# ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127837
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4194304
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

[root@epimethee lh-trace]# ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 127837
max locked memory           (kbytes, -l) 64
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1048576
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 127837
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

https://github.com/fkocik/lh-support/blob/main/lh-trace.tar.gz?raw=true

It’s a bare-metal setup so I do not have enough resources to reproduce on another Linux distro. I can try to downgrade kernel in order to switch to LTS stream. Don’t know if it may help ?

Another interesting result I found is this parameter of the runtime engine (CRI-O) :

--pids-limit value    Maximum number of processes allowed in a container (default: 1024) [$CONTAINER_PIDS_LIMIT]

Do you think it can be the root cause ?