horizon: Timed out job hangs for 15 or 30 minutes and then runs

  • Horizon Version: 3.7.2 / 3.4.7
  • Laravel Version: 6.17.0
  • PHP Version: 7.4.4
  • Redis Driver & Version: predis 1.1.1 / phpredis 5.2.1
  • Database Driver & Version: -

Description:

We are having strange errors with our Horizon. Basically this is what happens:

  • A job is queued. And starts processing.
  • After 90 seconds (our timeout config value) it times out.
  • After 120 seconds (our retry_after value) job is retried.
  • Retried job succeeds.
  • After 15 minutes or 30 minutes, the original job(the one timed out) finishes. With running the actual job.

Seems like this can happen to any kind of job. For example if it’s mailable that is queued, the user gets an email first, then 15 or 30 minutes later user gets another email. Same one.

Here our config files

config/database.php:

'redis' => [
        'client' => env('REDIS_CLIENT', 'predis'),
        'default' => [
            'host' => env('REDIS_HOST', '127.0.0.1'),
            'password' => env('REDIS_PASSWORD', null),
            'port' => env('REDIS_PORT', 6379),
            'database' => 0,
        ],
    ],

config/queue.php:

'redis' => [
    'driver' => 'redis',
    'connection' => 'default',
    'queue' => env('DEFAULT_QUEUE_NAME', 'default'),
    'retry_after' => 120, // 2 minutes
    'block_for' => null,
],

config/horizon.php:

'environments' => [
    'production' => [
        'supervisor-1' => [
            'connection' => env('HORIZON_CONNECTION', 'database'),
            'queue' => [env('DEFAULT_QUEUE_NAME', 'default')],
            'balance' => 'simple',
            'processes' => 10,
            'tries' => 3,
            'timeout' => 90,
        ],
    ],
]
Here how it looks in Horizon Dashboard

This when the initial job times out. It stays like this in Recent Jobs while the retries are working. horizon1

After almost half an hour it changes to this:

horizon2

It’s the same tags, I just blacked out names.

Here are the logs we are seeing (times here are in UTC)

[2020-04-22 11:24:59][88] Processing: App\Mail\ReservationInformation [2020-04-22 11:29:00][88] Failed: App\Mail\ReservationInformation [2020-04-22 11:29:00][88] Processing: App\Mail\ReservationInformation [2020-04-22 11:56:21][88] Processed: App\Mail\ReservationInformation

Note: With Predis we also see some logs like Error while reading line from the server. [tcp://REDIS_HOST:6379] but with PHPRedis there was none.

Here is an example JSON returned from Horizon endpoint /horizon/api/jobs/failed/332
{
  "JSON":{
    "id":"332",
    "connection":"redis",
    "queue":"newquebe1p",
    "name":"App\\Jobs\\SomeJob",
    "status":"failed",
    "payload":{
      "type":"job",
      "timeout":null,
      "tags":[
        "Some tags"
      ],
      "id":"332",
      "data":{
        "command":"....s:5:\"tries\";i:1;s:6:\"\u0000*\u0000job\";N;s:10:\"connection\";N;s:5:\"queue\";N;s:15:\"chainConnection\";N;s:10:\"chainQueue\";N;s:5:\"delay\";N;s:10:\"middleware\";a:0:{}s:7:\"chained\";a:0:{}}",
        "commandName":"App\\Jobs\\SomeJob"
      },
      "displayName":"App\\Jobs\\SomeJob",
      "timeoutAt":null,
      "pushedAt":"1587561267.9688",
      "job":"Illuminate\\Queue\\CallQueuedHandler@call",
      "maxTries":1,
      "delay":null,
      "attempts":2
    },
    "exception":"Illuminate\\Queue\\MaxAttemptsExceededException: App\\Jobs\\SomeJob has been attempted too many times or run too long. The job may have previously timed out. in /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:632\nStack trace:\n#0 /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(446): Illuminate\\Queue\\Worker->maxAttemptsExceededException(Object(Illuminate\\Queue\\Jobs\\RedisJob))\n#1 /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(358): Illuminate\\Queue\\Worker->markJobAsFailedIfAlreadyExceedsMaxAttempts('redis', Object(Illuminate\\Queue\\Jobs\\RedisJob), 1)\n#2 /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(314): Illuminate\\Queue\\Worker->process('redis', Object(Illuminate\\Queue\\Jobs\\RedisJob), Object(Illuminate\\Queue\\WorkerOptions))\n#3 /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(134): Illuminate\\Queue\\Worker->runJob(Object(Illuminate\\Queue\\Jobs\\RedisJob), 'redis', Object(Illuminate\\Queue\\WorkerOptions))\n#4 /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(112): Illuminate\\Queue\\Worker->daemon('redis', 'floquebe1p', Object(Illuminate\\Queue\\WorkerOptions))\n#5 /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(96): Illuminate\\Queue\\Console\\WorkCommand->runWorker('redis', 'floquebe1p')\n#6 /var/www/html/vendor/laravel/horizon/src/Console/WorkCommand.php(46): Illuminate\\Queue\\Console\\WorkCommand->handle()\n#7 [internal function]: Laravel\\Horizon\\Console\\WorkCommand->handle()\n#8 /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(32): call_user_func_array(Array, Array)\n#9 /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Util.php(36): Illuminate\\Container\\BoundMethod::Illuminate\\Container\\{closure}()\n#10 /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(90): Illuminate\\Container\\Util::unwrapIfClosure(Object(Closure))\n#11 /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(34): Illuminate\\Container\\BoundMethod::callBoundMethod(Object(Illuminate\\Foundation\\Application), Array, Object(Closure))\n#12 /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Container.php(590): Illuminate\\Container\\BoundMethod::call(Object(Illuminate\\Foundation\\Application), Array, Array, NULL)\n#13 /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Command.php(173): Illuminate\\Container\\Container->call(Array)\n#14 /var/www/html/vendor/symfony/console/Command/Command.php(255): Illuminate\\Console\\Command->execute(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Illuminate\\Console\\OutputStyle))\n#15 /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Command.php(160): Symfony\\Component\\Console\\Command\\Command->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Illuminate\\Console\\OutputStyle))\n#16 /var/www/html/vendor/symfony/console/Application.php(1011): Illuminate\\Console\\Command->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))\n#17 /var/www/html/vendor/symfony/console/Application.php(272): Symfony\\Component\\Console\\Application->doRunCommand(Object(Laravel\\Horizon\\Console\\WorkCommand), Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))\n#18 /var/www/html/vendor/symfony/console/Application.php(148): Symfony\\Component\\Console\\Application->doRun(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))\n#19 /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Application.php(93): Symfony\\Component\\Console\\Application->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))\n#20 /var/www/html/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(131): Illuminate\\Console\\Application->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))\n#21 /var/www/html/artisan(37): Illuminate\\Foundation\\Console\\Kernel->handle(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))\n#22 {main}",
    "failed_at":"1587562328.2697",
    "completed_at":"1587562207.4457",
    "retried_by":[

    ],
    "reserved_at":"1587561389.885",
    "index":0
  },
}

This is not the same job as in the screenshots. But I thought it might be still useful. Specially timestamps.

We tried a lot of different combinations, to eliminate the problem. And it happened in every combination. So we think it must be something with Horizon.

We tried:

  • Predis with Redis 5 and Redis 3
  • Predis with different read_write_timeout values
  • PHPRedis with Redis 5 and Redis 3
  • THP was enabled on one server. So we also tried all combinations with a server that has THP disabled.
  • We were at Laravel 6.11 and Horizon 3.4.7 then upgraded to Laravel 6.14 and Horizon 3.7.2

There is only one instance of Horizon running. And no other queue is handled in this Horizon instance.

Steps To Reproduce:

This is the hard part. This happens randomly. And locally we can’t reproduce it.

Any information or tips to try are welcome!

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 25 (8 by maintainers)

Most upvoted comments

@rgembalik I don’t remember the old values. But currently we have these:

image

@rgembalik For us this turned out to be a configuration error in our systems. We were using OpenShift and Docker. We adjusted these values in our containers/systems:

net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_probes
net.ipv4.tcp_keepalive_time

and for now everything works normally.

image

Most of the jobs look like this now. First job “starts”, but doesn’t actually run (for example log statements inside the job is not ran), then 2 minutes later (our rety_after value) the second attempt is executed. This time it runs successfully. But the first job still “hangs”. And after 30 minutes it is updated like in the pic above.

Hey there,

Unfortunately we don’t support this version anymore. Please check out our support policy on which versions we are currently supporting. Can you please try to upgrade to the latest version and see if your problem persists? We’ll help you out and re-open this issue if so.

Thanks!