magento2: MessageQueue cron runner repeatedly launches duplicate consumers when ps command is provided by Busybox

Preconditions (*)

  1. Magento 2.3.2
  2. Alpine Linux (or any distro using busybox to provide ps command)
  3. ps command provided by busybox (no procps installed)
  4. no php-posix module installed.
  5. Cron jobs setup

Steps to reproduce (*)

  1. ps -a
  2. Wait 5 minutes
  3. ps -a

Expected result (*)

  1. Only one set of consumers should be running.

Actual result (*)

  1. Every minute a new set of consumers is launched.

\Magento\MessageQueue\Model\Cron\ConsumersRunner

uses one of two ways to determine whether the consumers are already running or not.

php posix_getpgid($pid)

or

exec ps -p $pid

The check to make sure no consumers are running fails, and new consumers are launched, even though there are already consumers running.

The machine will expire shortly later when all memory is exhausted.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (11 by maintainers)

Most upvoted comments

@gwharton or @maderlock: out of curiosity: did you already test this with Magento 2.3.3? There were some changes done to how Magento checks if a consumer process is already running. I’m not sure if it would solve your problem, but it might …

The ps -p code has been removed in that same commit

Installing php7-posix module resolves the issue on alpine to get the posix functions working.

The documentation should be updated that it is now a requirement that either “ps -p $pid” or php’s posix extension is required.

The module should be a little more intelligent at determining what it should do if both methods fail, instead of just relaunching and killing the machine.

<?php
exec(escapeshellcmd('ps -ohmygoshthiscantwork'), $output, $code);
echo "Testing command ps -ohmygoshthiscantwork - Return is : ";
$code = (int) $code;
switch ($code) {
    case 0:
        echo "0\n";
        break;
    case 1:
        echo "1\n";
        break;
    default:
        echo "other\n";
        break;
}

exec(escapeshellcmd('ps -p ' . "1"), $output, $code);
echo "Testing command ps -p 1 - Return is : ";
$code = (int) $code;
switch ($code) {
    case 0:
        echo "0\n";
        break;
    case 1:
        echo "1\n";
        break;
    default:
        echo "other\n";
        break;
}

exec(escapeshellcmd('ps -p ' . "9999"), $output, $code);
echo "Testing command ps -p 9999 - Return is : ";
$code = (int) $code;
switch ($code) {
    case 0:
        echo "0\n";
        break;
    case 1:
        echo "1\n";
        break;
    default:
        echo "other\n";
        break;
}

Ubuntu 19

www-data@dev:~/dev1$
Testing command ps -ohmygoshthiscantwork - Return is : 1
Testing command ps -p 1 - Return is : 0
Testing command ps -p 9999 - Return is : 1
www-data@dev:~/dev1$

Alpine with busybox only

www-data@dev:~/dev1$
Testing command ps -ohmygoshthiscantwork - Return is : 1
Testing command ps -p 1 - Return is : 1
Testing command ps -p 9999 - Return is : 1
www-data@dev:~/dev1$

Alpine with procps

www-data@dev:~/dev1$
Testing command ps -ohmygoshthiscantwork - Return is : 1
Testing command ps -p 1 - Return is : 0
Testing command ps -p 9999 - Return is : 1
www-data@dev:~/dev1$

The mechanism cannot detect the difference between ps being called with an unknown argument, or ps -p being called on a process id that doesnt exist. Both return 1. And installations using only busybox it fails altogether.

I’m getting the same problem of constantly spinning up consumers under Magento 2.3.2 on Amazon Linux.

Oddly, this has procps installed, so there must be some other reason it is not able to find the processes. It is writing to the .pid file, so I do not think this is a permissions problem.

@gwharton I was able to resolve a similar issue by adding procps to my alpine dependencies.

RUN apk add --no-cache \
  gzip \
  freetype-dev \
  icu-dev \
  libjpeg-turbo-dev \
  libpng-dev \
  libxslt-dev \
  lsof \
  curl-dev \
  libsodium-dev \
  mysql-client \
  procps \
  zip