ray: [rllib] PPO crashes when using GPU with all environments using python 3.8

What is the problem?

Ray/RLLib 1.2.0 and 1.1.0, 2.0.0dev Pytorch 1.7.1 and 1.8.0 OS: ubuntu 20.04 Python: 3.8

I cannot get any examples to work using PPO. I either a) consistently get the crash message if i specify num_gpus=1

  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/evaluation/postprocessing.py", line 138, in compute_gae_for_sample_batch
    batch = compute_advantages(
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/evaluation/postprocessing.py", line 51, in compute_advantages
    np.array([last_r])])
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/torch/tensor.py", line 630, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

b) rllib does not use the GPU and tries to train on CPU (which is too slow to be usable)

Notes: I’m using ImpalaTrainer for many experiments which works great on GPU for me with no errors,

torch.cuda.is_available() returns False on all the worker policies, but True on the policy template.

local_mode=True causes all torch.cuda.is_available() calls to return False and training switches to CPU

Reproduction (REQUIRED)

import ray
from ray import tune
from ray.rllib.agents.ppo import PPOTrainer
from ray.rllib.examples.env.random_env import RandomEnv

if __name__ == '__main__':

    ray.init(num_gpus=1)

    env_name = 'ray-ma-env'

    config = {
        'framework': 'torch',
        'num_workers': 1,
        'num_envs_per_worker': 1,

        'num_gpus': 1,

        'env': RandomEnv,

    }

    stop = {
        'timesteps_total': 10000,
    }

    result = tune.run(PPOTrainer, config=config, stop=stop)

If the code snippet cannot be run by itself, the issue will be closed with “needs-repro-script”.

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 18 (16 by maintainers)

Most upvoted comments

I can reproduce the error now (with py38). Taking another look.

When I add an .item() in ppo_torch.policy.py line ~222, it works fine. Will PR … https://github.com/ray-project/ray/blob/master/rllib/agents/ppo/ppo_torch_policy.py#L222

@sven1977 you are trying to reproduce with the wrong version of python.

This error only happens to me with 3.8. I cannot reproduce it with 3.7. Apologies if this was unclear.

Can you try with 3.8 and see if the issue still happens to you?

@sven1977 I’ve gotten two reports of the same bug on Neural MMO. I am unable to reproduce the bug and it seems we have the same pytorch (v1.7.1) and Conda versions, also 20.04. I have one report of the bug on 18.04. Oddly enough, I can’t reproduce it. Only difference as far as I can tell is the card: I’m running on a 3080, reports have come in on a 970 and a Tesla P100

Thanks for the quick response @sven1977. Here is the full stack trace:

ray.exceptions.RayTaskError(TypeError): ray::PPO.train_buffered() (pid=18732, ip=192.168.1.100)
  File "python/ray/_raylet.pyx", line 439, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 473, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 476, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 107, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 486, in __init__
    super().__init__(config, logger_creator)
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/tune/trainable.py", line 97, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 654, in setup
    self._init(self.config, self.env_creator)
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 134, in _init
    self.workers = self._make_workers(
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 725, in _make_workers
    return WorkerSet(
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 90, in __init__
    self._local_worker = self._make_worker(
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 321, in _make_worker
    worker = cls(
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 479, in __init__
    self.policy_map, self.preprocessors = self._build_policy_map(
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1111, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/policy/policy_template.py", line 266, in __init__
    self._initialize_loss_from_dummy_batch(
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/policy/policy.py", line 634, in _initialize_loss_from_dummy_batch
    postprocessed_batch = self.postprocess_trajectory(batch_for_postproc)
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/policy/policy_template.py", line 290, in postprocess_trajectory
    return postprocess_fn(self, sample_batch,
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/evaluation/postprocessing.py", line 138, in compute_gae_for_sample_batch
    batch = compute_advantages(
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/ray/rllib/evaluation/postprocessing.py", line 51, in compute_advantages
    np.array([last_r])])
  File "/home/bam4d/anaconda3/envs/griddly/lib/python3.8/site-packages/torch/tensor.py", line 621, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.