ray: [rllib] PPO crashes when using GPU with all environments using python 3.8
What is the problem?
Ray/RLLib 1.2.0 and 1.1.0, 2.0.0dev Pytorch 1.7.1 and 1.8.0 OS: ubuntu 20.04 Python: 3.8
I cannot get any examples to work using PPO. I either
a) consistently get the crash message if i specify num_gpus=1
File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/evaluation/postprocessing.py", line 138, in compute_gae_for_sample_batch
batch = compute_advantages(
File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/evaluation/postprocessing.py", line 51, in compute_advantages
np.array([last_r])])
File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/torch/tensor.py", line 630, in __array__
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
b) rllib does not use the GPU and tries to train on CPU (which is too slow to be usable)
Notes:
I’m using ImpalaTrainer for many experiments which works great on GPU for me with no errors,
torch.cuda.is_available() returns False on all the worker policies, but True on the policy template.
local_mode=True causes all torch.cuda.is_available() calls to return False and training switches to CPU
Reproduction (REQUIRED)
import ray
from ray import tune
from ray.rllib.agents.ppo import PPOTrainer
from ray.rllib.examples.env.random_env import RandomEnv
if __name__ == '__main__':
ray.init(num_gpus=1)
env_name = 'ray-ma-env'
config = {
'framework': 'torch',
'num_workers': 1,
'num_envs_per_worker': 1,
'num_gpus': 1,
'env': RandomEnv,
}
stop = {
'timesteps_total': 10000,
}
result = tune.run(PPOTrainer, config=config, stop=stop)
If the code snippet cannot be run by itself, the issue will be closed with “needs-repro-script”.
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 18 (16 by maintainers)
I can reproduce the error now (with py38). Taking another look.
When I add an
.item()in ppo_torch.policy.py line ~222, it works fine. Will PR … https://github.com/ray-project/ray/blob/master/rllib/agents/ppo/ppo_torch_policy.py#L222@sven1977 you are trying to reproduce with the wrong version of python.
This error only happens to me with 3.8. I cannot reproduce it with 3.7. Apologies if this was unclear.
Can you try with 3.8 and see if the issue still happens to you?
@sven1977 I’ve gotten two reports of the same bug on Neural MMO. I am unable to reproduce the bug and it seems we have the same pytorch (v1.7.1) and Conda versions, also 20.04. I have one report of the bug on 18.04. Oddly enough, I can’t reproduce it. Only difference as far as I can tell is the card: I’m running on a 3080, reports have come in on a 970 and a Tesla P100
Thanks for the quick response @sven1977. Here is the full stack trace: