ray: [tune] The actor or task cannot be scheduled right now

I have enough resources but still report a warning:

The actor or task with ID 124a2b0fc855a8f8ffffffff01000000 cannot be scheduled right now. It requires {CPU: 1.000000} for placement, but this node only has remaining {accelerator_type:P5000: 1.000000}, {node:172.31.226.37: 1.000000}, {memory: 71.142578 GiB}, {object_store_memory: 23.779297 GiB}, {GPU: 0.250000}. In total there are 7 pending tasks and 0 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.

How should i deal with this problem？ Thanks.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (7 by maintainers)

Most upvoted comments

I reduce the number of GPUs per trial and does not specify the number of cpu per trial. It works.

resources_per_trial={"gpu": 0.1}

BBDrive on Feb 4, 2021

Not exactly - the 10 CPUs are reserved just for the main function of the trainable. If this main function requests more resources, you need to use the extra_* variables.

E.g.:

resources_per_trial={
    "cpu": 1,
    "extra_cpu": 9,
    "extra_gpu": 0.25
}

This would reserve 10 CPUs and 0.25 GPUs. The main function will be allocated 1 CPU, and then 9 CPUs and 0.25 GPUs would be left for the main function to schedule itself.

Please note that in the future we will deprecate support for extra_ arguments in favor for placement groups. This will take another couple of weeks though, so you should be safe to use it as is.

krfricke on Feb 4, 2021