longhorn: [BUG] zombie processes

Describe the bug Longhorn manager does create a lot of zombie processes (200 or more per day)

To Reproduce Steps to reproduce the behavior:

  1. Installed version 1.0.0 (but also occurs on 0.8.1)

Expected behavior no zombie processes - of course

Log 2020-06-04T21:07:57.716716931+02:00 time="2020-06-04T19:07:57Z" level=debug msg="Failed to check for the latest upgrade: Post \"https://longhorn-upgrade-responder.rancher.io/v1/checkupgrade\": dial tcp: lookup longhorn-upgrade-responder.rancher.io on 10.43.0.10:53: server misbehaving" 2020-06-04T21:08:23.69727142+02:00 time="2020-06-04T19:08:23Z" level=debug msg="Skip rebuilding for volume pvc-fc1a6e3e-dfe8-45e7-8ef1-805a400d3f25 because there is rebuilding in process" 2020-06-04T21:08:23.757795868+02:00 time="2020-06-04T19:08:23Z" level=debug msg="Skip rebuilding for volume pvc-fc1a6e3e-dfe8-45e7-8ef1-805a400d3f25 because there is rebuilding in process" 2020-06-04T21:08:53.697197025+02:00 time="2020-06-04T19:08:53Z" level=debug msg="Skip rebuilding for volume pvc-fc1a6e3e-dfe8-45e7-8ef1-805a400d3f25 because there is rebuilding in process" 2020-06-04T21:08:53.758270386+02:00 time="2020-06-04T19:08:53Z" level=debug msg="Skip rebuilding for volume pvc-fc1a6e3e-dfe8-45e7-8ef1-805a400d3f25 because there is rebuilding in process" 2020-06-04T21:08:57.716166046+02:00 time="2020-06-04T19:08:57Z" level=warning msg="backup store monitor: failed to list backup volumes in nfs://5.35.255.242:/aryalonghorn: error listing backup volumes: Timeout executing: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.0.0/longhorn [backup ls --volume-only nfs://5.35.255.242:/aryalonghorn], output , stderr, , error <nil>" 2020-06-04T21:09:23.698234644+02:00 time="2020-06-04T19:09:23Z" level=debug msg="Skip rebuilding for volume pvc-fc1a6e3e-dfe8-45e7-8ef1-805a400d3f25 because there is rebuilding in process"

Environment:

  • Longhorn version: 1.0.0 (also saw on 0.8.1)
  • Kubernetes version: v1.18.2+k3s1
  • Node OS type and version: Ubuntu 18.04.4 LTS

Additional context I do have another problem with web-socket protocol (discussed on separate issue report) #1442. Not sure there is any coincidence.

Remark to the logfile above: yes, there is some rebuilding going on. But not my main-topic here.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

i upgraded over a week ago and no issues so far! It seems to be fixed. Thanks for your fast fix. 😃

I upgraded to 1.1.0 yesterday from 1.0.2 on the same setup as earlier and no Zombie processes. Thanks