ray: [Bug] [Autoscaler] Autoscaler can start unnecessary nodes due to poor bin packing heuristic ordering

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Clusters

What happened + What you expected to happen

When running the repro script underneath that creates

1 placement group with 31 CPUs + 1 GPU

head node: 16 CPUs worker nodes 1 GPU + 16 CPUS

we are supposed to get 1 head node + 1 gpu node.

This is the case when running Ray 1.6, but in Ray 1.7, it starts one more CPU node unnecessary.

1.6 ray status

Node status
---------------------------------------------------------------
Healthy:
 1 head_node
 1 gpu_node
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

1.7 ray status

Node status
---------------------------------------------------------------
Healthy:
 1 head_node
 1 gpu_node
 1 cpu_node
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Reproduction script

Script

import time

import ray
from ray.util.placement_group import (
    placement_group,
    placement_group_table,
    remove_placement_group
)


@ray.remote(num_cpus=1)
class Worker(object):
    def __init__(self, i):
        self.i = i

    def work(self):
        time.sleep(0.1)
        print("work ", self.i)


@ray.remote(num_cpus=1, num_gpus=1)
class Trainer(object):
    def __init__(self, i):
        self.i = i

    def train(self):
        time.sleep(0.2)
        print("train ", self.i)


def main():
    ray.init(address="auto")
             # cluster_env="jun-julia-working-2:1")

    bundles = [{"CPU": 1, "GPU": 1}]
    bundles += [{"CPU": 1} for _ in range(30)]

    pg = placement_group(bundles, strategy="PACK")

    ray.get(pg.ready())

    workers = [Worker.options(placement_group=pg).remote(i) for i in range(30)]

    trainer = Trainer.options(placement_group=pg).remote(0)

    while True:
        ray.get([workers[i].work.remote() for i in range(30)])
        ray.get(trainer.train.remote())



if __name__ == "__main__":
    main()

Env

base_image: "anyscale/ray-ml:pinned-nightly-py37"
debian_packages: []

python:
  pip_packages: []
  conda_packages: []

post_build_cmds:
  - pip uninstall -y ray
  # - pip install -U https://ray-wheels.s3.us-west-2.amazonaws.com/releases/1.7.0/2367a2cb9033913b68b1230316496ae273c25b54/ray-1.7.0-cp37-cp37m-manylinux2014_x86_64.whl
  # 1.6
  # - pip install -U ray==1.6
  # master
  - pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
  - pip3 install -U ray[default]

Cluster

cloud_id: {{env["ANYSCALE_CLOUD_ID"]}}
region: us-west-2

aws:
    BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
            VolumeSize: 500

head_node_type:
    name: head_node
    instance_type: m5.4xlarge

worker_node_types:
    - name: cpu_node
      instance_type: m5.4xlarge
      min_workers: 0
      max_workers: 10
      use_spot: false
    - name: gpu_node
      instance_type: g3.4xlarge
      min_workers: 0 
      max_workers: 10
      use_spot: false

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 20 (14 by maintainers)

Most upvoted comments

I tried running the following in test_resource_demand_scheduler and can reproduce in unit test:

def test_bug():
    TYPES = {
        "cpu": {
            "resources": {
                "CPU": 16,
            },
            "max_workers": 10,
        },
        "gpu": {
            "resources": {
                "CPU": 16,
                "GPU": 1,
            },
            "max_workers": 10,
        },
    }
    print("NODES TO ADD:")
    print(
        get_nodes_for(TYPES, {}, "cpu", 9999, ([{"GPU": 1, "CPU": 1}] + [{"CPU": 1}] * 30)))

pytest -v -s test_resource_demand_scheduler.py::test_bug produces

defaultdict(<class 'int'>, {'cpu': 2, 'gpu': 1})

When I think it should return just {'gpu': 1}. It’s like CPU demand resources are double counted or something.

Hmm why not change the order to pack more complex (GPU) requests first? Seems that solves the issue.

On Thu, Oct 7, 2021, 8:39 AM Sasha Sobol @.***> wrote:

There are a few ways we could attack it:

Add node per type costs and try to minimize the total cost (either through heuristics or with something more advanced). 2.

Pack with max_nodes == inf always, and then, in post processing step, try removing nodes without violating the constraints. And then pick a few nodes to satisfy max_nodes requirements. 3.

Is cleaner, but is more work 4.

Is simpler and hacker

What do you think @ericl https://github.com/ericl

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/19124#issuecomment-937913284, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADUSQYSDDLRBBLLEUH4BTUFW5KDANCNFSM5FM6WU3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.