ray: Ray is not finding GPU but TF, PyTorch and nvcc does

I have two NVIDIA TitanX but Ray isn’t seeing any:

ray.init(num_gpus=2)
print(ray.get_gpu_ids())
# prints []

Ray also prints below inicating no GPUs:

2019-10-16 18:20:17,954 INFO multi_gpu_optimizer.py:93 -- LocalMultiGPUOptimizer devices ['/cpu:0']

But TensorFlow sees all devices:

import tensorflow
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

That prints:

[name: "/device:CPU:0"
device_type: "CPU"
...
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
...
, name: "/device:GPU:0"
device_type: "GPU"
...
, name: "/device:GPU:1"
device_type: "GPU"
...
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
...
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
...
]

Similarly,

/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Why Ray doesn’t see my GPUs?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 9
  • Comments: 16 (5 by maintainers)

Most upvoted comments

I am having the same issue as @Wormh0-le. This is preventing me from training a torch policy without ray.tune which I do not which to use. I just want to call .train() on my agent.

Thanks, that was helpful although its confusing. This is what happens:

Even if I explicitly init ray with num_gpus=1, ray.get_gpu_ids() is [].

However, if I start PPOTrainer with explicit num_gpus=1 then ray gets GPU. If I don’t set this in config then it doesn’t.

I believe the confusing part is ray.get_gpu_ids() which I thought is the detected GPUs in the system. Instead, it’s actually allocated gpus in the system. I think there should be a method, may be, detected_gpus() so one can test that ray indeed sees GPUs and things are good to go. It would also be great if Ray just allocated GPUs automatically to itself (which should be good perhaps 99% of the times) so we don’t have to worry about this additional config.

and I explicit num_gpus=1,but ray still can’t get GPU, and torch.cuda.is_available() is True. why?

How would it know how many GPUs to give to each trial?

Please see https://ray.readthedocs.io/en/latest/tune-usage.html#trial-parallelism

On Thu, Oct 17, 2019, 5:58 PM Christian Herz notifications@github.com wrote:

@ericl https://github.com/ericl Why wouldn’t it automatically allocate all found GPU unless otherwise defined?

Is this the resource? #5940 (comment) https://github.com/ray-project/ray/issues/5940#issuecomment-543005640

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/5940?email_source=notifications&email_token=AAADUSXSJOOLAF4DQ5RLDK3QPEC2ZA5CNFSM4JBS5PL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBSCD4I#issuecomment-543433201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADUSUIPLFDNNQAEPEL6BTQPEC2ZANCNFSM4JBS5PLQ .