pytorch-lightning: pytorch_lightning.utilities.debugging.MisconfigurationException
Hi, I encountered the problem like #899 ,But I checked my pytorch is not CPU version. Can anyone help? Thanks!
Traceback (most recent call last):
File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/allen_wu/.vscode-server-insiders/extensions/ms-python.python-2020.3.69010/pythonFiles/lib/python/debugpy/wheels/debugpy/__main__.py", line 45, in <module>
cli.main()
File "/home/allen_wu/.vscode-server-insiders/extensions/ms-python.python-2020.3.69010/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 427, in main
run()
File "/home/allen_wu/.vscode-server-insiders/extensions/ms-python.python-2020.3.69010/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 264, in run_file
runpy.run_path(options.target, run_name="__main__")
File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/allen_wu/sota_lm_dev/codebase/gpt2/Gpt2SeqClassifier.py", line 200, in <module>
trainer = Trainer(gpus=1)
File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 366, in __init__
self.data_parallel_device_ids = parse_gpu_ids(self.gpus)
File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 622, in parse_gpu_ids
gpus = sanitize_gpu_ids(gpus)
File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 592, in sanitize_gpu_ids
raise MisconfigurationException(message)
pytorch_lightning.utilities.debugging.MisconfigurationException:
You requested GPUs: [0]
But your machine only has: []
My environment packages:
# Name Version Build Channel
_libgcc_mutex 0.1 main
absl-py 0.9.0 pypi_0 pypi
attrs 19.3.0 py_0 conda-forge
backcall 0.1.0 py_0 conda-forge
blas 1.0 mkl
bleach 3.1.3 pyh8c360ce_0 conda-forge
boto3 1.12.24 pypi_0 pypi
botocore 1.15.24 pypi_0 pypi
ca-certificates 2019.11.28 hecc5488_0 conda-forge
cachetools 4.0.0 pypi_0 pypi
certifi 2019.11.28 py37hc8dfbb8_1 conda-forge
chardet 3.0.4 pypi_0 pypi
click 7.1.1 pypi_0 pypi
cudatoolkit 10.1.243 h6bb024c_0
decorator 4.4.2 py_0 conda-forge
defusedxml 0.6.0 py_0 conda-forge
docutils 0.15.2 pypi_0 pypi
entrypoints 0.3 py37hc8dfbb8_1001 conda-forge
filelock 3.0.12 pypi_0 pypi
freetype 2.9.1 h8a8886c_1
future 0.18.2 pypi_0 pypi
google-auth 1.11.3 pypi_0 pypi
google-auth-oauthlib 0.4.1 pypi_0 pypi
grpcio 1.27.2 pypi_0 pypi
icu 64.2 he1b5a44_1 conda-forge
idna 2.9 pypi_0 pypi
importlib-metadata 1.5.0 py37hc8dfbb8_1 conda-forge
importlib_metadata 1.5.0 1 conda-forge
intel-openmp 2020.0 166
ipykernel 5.1.4 py37h5ca1d4c_0 conda-forge
ipython 7.13.0 py37h43977f1_1 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
ipywidgets 7.5.1 pypi_0 pypi
jedi 0.16.0 py37hc8dfbb8_1 conda-forge
jinja2 2.11.1 py_0 conda-forge
jmespath 0.9.5 pypi_0 pypi
joblib 0.14.1 pypi_0 pypi
jpeg 9b h024ee3a_2
json5 0.9.0 py_0 conda-forge
jsonschema 3.2.0 py37hc8dfbb8_1 conda-forge
jupyter_client 6.0.0 py_0 conda-forge
jupyter_core 4.6.3 py37hc8dfbb8_1 conda-forge
jupyterlab 2.0.1 py_0 conda-forge
jupyterlab_server 1.0.7 py_0 conda-forge
ld_impl_linux-64 2.33.1 h53a641e_7
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libpng 1.6.37 hbc83047_0
libsodium 1.0.17 h516909a_0 conda-forge
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_0
libuv 1.34.0 h516909a_0 conda-forge
markdown 3.2.1 pypi_0 pypi
markupsafe 1.1.1 py37h8f50634_1 conda-forge
mistune 0.8.4 py37h516909a_1000 conda-forge
mkl 2020.0 166
mkl-service 2.3.0 py37he904b0f_0
mkl_fft 1.0.15 py37ha843d7b_0
mkl_random 1.1.0 py37hd6b4f25_0
nbconvert 5.6.1 py37_0 conda-forge
nbformat 5.0.4 py_0 conda-forge
ncurses 6.2 he6710b0_0
ninja 1.9.0 py37hfd86e86_0
nodejs 13.10.1 hf5d1a2b_0 conda-forge
notebook 6.0.3 py37_0 conda-forge
numpy 1.18.1 py37h4f9e942_0
numpy-base 1.18.1 py37hde5b4d6_1
oauthlib 3.1.0 pypi_0 pypi
olefile 0.46 py37_0
openssl 1.1.1e h516909a_0 conda-forge
pandas 1.0.2 py37h0573a6f_0
pandoc 2.9.2 0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
parso 0.6.2 py_0 conda-forge
pexpect 4.8.0 py37hc8dfbb8_1 conda-forge
pickleshare 0.7.5 py37hc8dfbb8_1001 conda-forge
pillow 7.0.0 py37hb39fc2d_0
pip 20.0.2 py37_1
prometheus_client 0.7.1 py_0 conda-forge
prompt-toolkit 3.0.4 py_0 conda-forge
protobuf 3.11.3 pypi_0 pypi
ptyprocess 0.6.0 py_1001 conda-forge
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pygments 2.6.1 py_0 conda-forge
pyrsistent 0.15.7 py37h8f50634_1 conda-forge
python 3.7.6 h0371630_2
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.7 1_cp37m conda-forge
pytorch 1.4.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
pytorch-lightning 0.7.1 pypi_0 pypi
pytz 2019.3 py_0
pyzmq 19.0.0 py37hac76be4_1 conda-forge
readline 7.0 h7b6447c_5
regex 2020.2.20 pypi_0 pypi
requests 2.23.0 pypi_0 pypi
requests-oauthlib 1.3.0 pypi_0 pypi
rsa 4.0 pypi_0 pypi
s3transfer 0.3.3 pypi_0 pypi
sacremoses 0.0.38 pypi_0 pypi
scikit-learn 0.22.2.post1 pypi_0 pypi
scipy 1.4.1 pypi_0 pypi
send2trash 1.5.0 py_0 conda-forge
sentencepiece 0.1.85 pypi_0 pypi
setuptools 46.0.0 py37_0
six 1.14.0 py37_0
sklearn 0.0 pypi_0 pypi
sqlite 3.31.1 h7b6447c_0
tensorboard 2.1.1 pypi_0 pypi
terminado 0.8.3 py37hc8dfbb8_1 conda-forge
testpath 0.4.4 py_0 conda-forge
tk 8.6.8 hbc83047_0
tokenizers 0.5.2 pypi_0 pypi
torchtext 0.5.0 pypi_0 pypi
torchvision 0.5.0 py37_cu101 pytorch
tornado 6.0.4 py37h8f50634_1 conda-forge
tqdm 4.43.0 pypi_0 pypi
traitlets 4.3.3 py37hc8dfbb8_1 conda-forge
transformers 2.5.1 pypi_0 pypi
urllib3 1.25.8 pypi_0 pypi
wcwidth 0.1.8 py_0 conda-forge
webencodings 0.5.1 py_1 conda-forge
werkzeug 1.0.0 pypi_0 pypi
wheel 0.34.2 py37_0
widgetsnbextension 3.5.1 pypi_0 pypi
xz 5.2.4 h14c3975_4
zeromq 4.3.2 he1b5a44_2 conda-forge
zipp 3.1.0 py_0 conda-forge
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 22 (10 by maintainers)
Training with a GPU works for me in pytorch, and pytorch lightning. But when I use ray.tune to search the hyperparameter space it dies with this exact error:
2020-09-01 13:39:29,157 ERROR trial_runner.py:523 – Trial DEFAULT_2d46f_00009: Error processing event. Traceback (most recent call last): File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/trial_runner.py”, line 471, in _process_trial result = self.trial_executor.fetch_result(trial) File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py”, line 430, in fetch_result result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT) File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/worker.py”, line 1538, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train() (pid=57762, ip=130.20.133.226) File “python/ray/_raylet.pyx”, line 479, in ray._raylet.execute_task File “python/ray/_raylet.pyx”, line 432, in ray._raylet.execute_task.function_executor File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/trainable.py”, line 332, in train result = self.step() File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py”, line 337, in step self._report_thread_runner_error(block=True) File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py”, line 456, in _report_thread_runner_error .format(err_tb_str))) ray.tune.error.TuneError: Trial raised an exception. Traceback: ray::ImplicitFunc.train() (pid=57762, ip=130.20.133.226) File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py”, line 224, in run self._entrypoint() File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py”, line 287, in entrypoint self._status_reporter.get_checkpoint()) File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py”, line 507, in _trainable_func output = train_func(config) File “<ipython-input-13-53ce147a2f7f>”, line 28, in train_tune File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py”, line 524, in init self.data_parallel_device_ids = _parse_gpu_ids(self.gpus) File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py”, line 451, in _parse_gpu_ids gpus = sanitize_gpu_ids(gpus) File “/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py”, line 410, in sanitize_gpu_ids “”") pytorch_lightning.utilities.exceptions.MisconfigurationException: You requested GPUs: [0] But your machine only has: []
And: torch.cuda.is_available() True
torch.version ‘1.3.1’
torch.cuda.device_count() 1
pytorch_lightning.version ‘0.6.1.dev’
CUDA_VISIBLE_DEVICES=6 on an 8-gpu machine.
I would say it as ray.tune, but it fails inside pytorch_lightning.
Any thoughts?
@awaelchli Checked both this thread and this issue as well: https://github.com/PyTorchLightning/pytorch-lightning/issues/899
For me (Python: 3.7.5)
import torch print(torch.cuda.device_count())
returns 0 on a DGX-1 (pascal generation) with cuda-10.1 and cuda-10.2 versions of pytorch 1.5.0. I did not find a cuda-10.0 version so I am currently working with pytorch 1.4.0 and cuda-10.0. “print(torch.cuda.device_count())” shows 8 with cuda-10.0 and pytorch 1.4.0
I am not setting CUDA_VISIBLE_DEVICES in my code.
I think you should set “CUDA_VISIBLE_DEVICES” in the environment variables;
Here is the introduction from Ray[tune] official website.