roadrunner: [🐛 BUG]: Jobs healthcheck receive invalid status

No duplicates 🥲.

  • I have searched for a similar issue in our bug tracker and didn’t find any solutions.

What happened?

A bug happened!

Healthchecks at url like /health?plugin=http,jobs return HTTP status 200 and error in text. Expected status 500.

Version (rr --version)

rr version 2023.1.3 (build time: 2023-05-11T12:32:01+0000, go1.20.4), OS: linux, arch: amd64

How to reproduce the issue?

Simple config:

version: "3"

server:
    command: "php public/index.php"
    relay: pipes
    env:
        -   APP_RUNTIME: Baldinof\RoadRunnerBundle\Runtime\Runtime
    user: www-data
    group: www-data

rpc:
    listen: tcp://127.0.0.1:6001

status:
    address: 0.0.0.0:2114

jobs:
    pool:
        allocate_timeout: 120s
        command: "php public/worker.php"
        num_workers: 3
    pipelines:
        test:
            driver: memory
            config:
                priority: 10
                prefetch: 10

GET 127.0.0.1:2114/jobs with HTTP 200:

plugin: jobs: pipeline: test | priority: 10 | ready: false | queue: test | active: 0 | delayed: 0 | reserved: 0 | driver: memory | error:  

GET 127.0.0.1:2114/health?plugin=http,jobs with HTTP 200:

plugin: http,jobs not found

Documentation says:

The health check endpoint will return HTTP 200 if there is at least one worker ready to serve requests. If there are no workers ready to service requests, the endpoint will return HTTP 500. If there are any other errors, the endpoint will also return HTTP 500.

But actually it returns 200.

After some time i had an error, look at relevant log output

Relevant log output

{"time":"2023-05-25T12:01:08.896604907Z","level":"ERROR","msg":"plugin returned an error from the Serve","!BADKEY":"static_pool_allocate_workers: WorkerAllocate: failed to spawn a worker, possible reasons: https://roadrunner.dev/docs/known-issues-allocate-timeout/2023.x/en","id":"*jobs.Plugin"}
error: static_pool_allocate_workers: WorkerAllocate: failed to spawn a worker, possible reasons: https://roadrunner.dev/docs/known-issues-allocate-timeout/2023.x/en
plugin: *jobs.Plugin

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

Goal:

  • investigating incidents (quickly change values and see the result)
  • experimenting under load (quickly change values and see the result)
  • PHP development in k8s avoiding local docker-compositions (one project launch environment with slight configuration differences)

RR in debug mode is not designed to be ready or healthy to accept a request in k8s.

Agree. That’s why “logic will not change and i just have to live with it in debug mode.”. This is not the biggest problem, i can find a compromise through the reset command. Thanks for the tip.

Healthchecks of the worker for the jobs plugin (for the pool of workers) will be released soon.

EDIT: plugin: jobs: pipeline: test | priority: 10 | ready: false this string as you may see is not structured ATM. When it’s ready, it’ll be in a structured JSON format.

I have Jobs plugin and mayby it’s not working. Look at the /jobs output: plugin: jobs: pipeline: test | priority: 10 | ready: false - ready is false. Why ready is false? There is a plugin, it has metrics/statistics. Plugin are ready or not?

It’s not ready because it’s paused, so the memory driver is not ready to handle requests. You should resume it to start consuming. (PHP API)

/health?plugin=http,jobs tells that jobs not found, but it actually used in config and return metrics.

There is no health check for the job plugin inside the /health endpoint. So you see the reasonable message: jobs not found.

/jobs endpoint is dedicated to check the status of the drivers -> plugin: jobs: pipeline: test | priority: 10 | ready: false

How do I finally know that the “Jobs” plugin is really healthy?

There are 2 cases. First case - worker is ready. Second case - driver is ready. Since this is different from checking, let’s say, http plugin where you only check for workers, for the jobs RR should check additional condition.

Can change the way a healthy Roadrunner is determined? Instead of “at least 1 worker is ready” to “at least 1 worker in each of the pools is ready”

Just send 2 requests (for each plugin you need), get responses and make a decision in your application.