ray: [RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead

What happened + What you expected to happen

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task. Hi all,

I am trying to load in a previously trained model to continue training it, except I get the following error:How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task. Hi all,

I am trying to load in a previously trained model to continue training it, except I get the following error:

Failure # 1 (occurred at 2023-03-31_14-54-08)
e[36mray::PPO.train()e[39m (pid=5616, ip=127.0.0.1, repr=PPO)
  File "python\ray\_raylet.pyx", line 875, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 879, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 819, in ray._raylet.execute_task.function_executor
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\_private\function_manager.py", line 674, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\tune\trainable\trainable.py", line 384, in train
    raise skipped from exception_cause(skipped)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\tune\trainable\trainable.py", line 381, in train
    result = self.step()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 794, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2810, in _run_one_training_iteration
    results = self.training_step()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\ppo\ppo.py", line 420, in training_step
    train_results = train_one_step(self, train_batch)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\execution\train_ops.py", line 52, in train_one_step
    info = do_minibatch_sgd(
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\utils\sgd.py", line 129, in do_minibatch_sgd
    local_worker.learn_on_batch(
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1029, in learn_on_batch
    info_out[pid] = policy.learn_on_batch(batch)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\utils\threading.py", line 24, in wrapper
    return func(self, *a, **k)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 663, in learn_on_batch
    self.apply_gradients(_directStepOptimizerSingleton)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 880, in apply_gradients
    opt.step()
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 141, in step
    adam(
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 281, in adam
    func(params,
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 449, in _multi_tensor_adam
    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

Relevant code:

tune.run("PPO",
         resume='AUTO',
         # param_space=config,
         config=ppo_config.to_dict(),
         name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
         max_failures=1,
         # restore="C:\\Users\\denys\\ray_results\\mediumbrawl-attention-256Att-128MLP-L2\\PPOTrainer_RandomEnv_1e882_00000_0_2022-06-02_15-13-44\\checkpoint_000028\\checkpoint-28",
         checkpoint_freq=5, checkpoint_at_end=True)

Versions / Dependencies

OS: Win11 Python: 3.10 Ray: latest nightly windows wheel

Reproduction script

n/a

Issue Severity

High: It blocks me from completing my task.

About this issue

Original URL
State: closed
Created a year ago
Comments: 32 (9 by maintainers)

Most upvoted comments

@DenysAshikhin I have pinged the owner of the other PR for possibility of merge . Thanks for your valuable inputs on our library.

kouroshHakha on May 4, 2023

Hey @DenysAshikhin, so here is the core change that need to happen in RLlib’s torch policy. If it’s urgent, you can make these changes on your local installation of ray. If it can wait a few days, you can either install nightly or use master once this PR is merged. If you need reliability, you have to wait for the released version.

kouroshHakha on May 4, 2023

Apparently the torch optimizer param_group values should not be moved to cuda devices when restoring optimizer states. There will be a PR addressing this issue in the next releases.

kouroshHakha on May 4, 2023

Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.
tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()
at the end of iteration. Single GPU setup. Python 3.9.16, torch 2.0.0+cu118, ray 2.3.1
trainer = RLTrainer(
    run_config=run_config,
    scaling_config=ScalingConfig(
        num_workers=2, use_gpu=True,
        trainer_resources={"CPU": 0.0}, 
        resources_per_worker={"CPU": 1.0}),
    algorithm="PPO",
    config=config_cf,
    resume_from_checkpoint=rl_checkpoint,
)
Tune Status Resources requested: 2.0/2 CPUs, 1.0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

Modifying the use_gpu=True to use_gpu=False makes the training to continue, but only on the CPU.

Tune Status Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

I’m happy to know there’s kinda a work-around for some. However, it still didn’t work for me and it doesn’t fix that I need to train on my GPU (as I’m sure others do as well)

DenysAshikhin on Apr 25, 2023