ray: [RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead
What happened + What you expected to happen
How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task. Hi all,
I am trying to load in a previously trained model to continue training it, except I get the following error:How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task. Hi all,
I am trying to load in a previously trained model to continue training it, except I get the following error:
Failure # 1 (occurred at 2023-03-31_14-54-08)
e[36mray::PPO.train()e[39m (pid=5616, ip=127.0.0.1, repr=PPO)
File "python\ray\_raylet.pyx", line 875, in ray._raylet.execute_task
File "python\ray\_raylet.pyx", line 879, in ray._raylet.execute_task
File "python\ray\_raylet.pyx", line 819, in ray._raylet.execute_task.function_executor
File "C:\personal\ai\ray_venv\lib\site-packages\ray\_private\function_manager.py", line 674, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\personal\ai\ray_venv\lib\site-packages\ray\tune\trainable\trainable.py", line 384, in train
raise skipped from exception_cause(skipped)
File "C:\personal\ai\ray_venv\lib\site-packages\ray\tune\trainable\trainable.py", line 381, in train
result = self.step()
File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 794, in step
results, train_iter_ctx = self._run_one_training_iteration()
File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2810, in _run_one_training_iteration
results = self.training_step()
File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\ppo\ppo.py", line 420, in training_step
train_results = train_one_step(self, train_batch)
File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\execution\train_ops.py", line 52, in train_one_step
info = do_minibatch_sgd(
File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\utils\sgd.py", line 129, in do_minibatch_sgd
local_worker.learn_on_batch(
File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1029, in learn_on_batch
info_out[pid] = policy.learn_on_batch(batch)
File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\utils\threading.py", line 24, in wrapper
return func(self, *a, **k)
File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 663, in learn_on_batch
self.apply_gradients(_directStepOptimizerSingleton)
File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 880, in apply_gradients
opt.step()
File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\optimizer.py", line 33, in _use_grad
ret = func(self, *args, **kwargs)
File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 141, in step
adam(
File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 281, in adam
func(params,
File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 449, in _multi_tensor_adam
torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.
Relevant code:
tune.run("PPO",
resume='AUTO',
# param_space=config,
config=ppo_config.to_dict(),
name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
max_failures=1,
# restore="C:\\Users\\denys\\ray_results\\mediumbrawl-attention-256Att-128MLP-L2\\PPOTrainer_RandomEnv_1e882_00000_0_2022-06-02_15-13-44\\checkpoint_000028\\checkpoint-28",
checkpoint_freq=5, checkpoint_at_end=True)
Versions / Dependencies
OS: Win11 Python: 3.10 Ray: latest nightly windows wheel
Reproduction script
n/a
Issue Severity
High: It blocks me from completing my task.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 32 (9 by maintainers)
@DenysAshikhin I have pinged the owner of the other PR for possibility of merge . Thanks for your valuable inputs on our library.
Hey @DenysAshikhin, so here is the core change that need to happen in RLlib’s torch policy. If it’s urgent, you can make these changes on your local installation of ray. If it can wait a few days, you can either install nightly or use master once this PR is merged. If you need reliability, you have to wait for the released version.
Apparently the torch optimizer param_group values should not be moved to cuda devices when restoring optimizer states. There will be a PR addressing this issue in the next releases.
I’m happy to know there’s kinda a work-around for some. However, it still didn’t work for me and it doesn’t fix that I need to train on my GPU (as I’m sure others do as well)