pytorch-lightning: "MisconfigurationException: No supported gpu backend found!" with multi gpu training in jupyter notebooks
Bug description
When trying to train on two GPUs in a jupyter notebooks environment on jarvislabs.ai with ddp_notebooks I get the following error “MisconfigurationException: No supported gpu backend found!”.
I’m trying to train on two RTX 5000 GPUs. On a Kaggle GPU the same code runs without any problem.
Any ideas?
How to reproduce the bug
trainer = pl.Trainer(
max_epochs=2,
accelerator="gpu",
devices=2,
precision=16,
accumulate_grad_batches=2
)
trainer.fit(model, train_dl, val_dl)
Error messages and logs
“MisconfigurationException: No supported gpu backend found!”
Environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0): 1.7.7
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 1.10): 1.11
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version: V11.6.55
#- GPU models and configuration: 2x RTX 5000
#- How you installed Lightning(`conda`, `pip`, source): pip
#- Running environment of LightningApp (e.g. local, cloud): jarvislabs.ai
More info
No response
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 17 (6 by maintainers)
I am also experiencing this issue all of a sudden after migrating from PTL 1.6.5 to 1.9.0
However, my colleagues and I have solved it by exporting
CUDA_VISIBLE_DEVICES=XXXas an environment variable on each of our nodes (we use 4 nodes with 8 GPUs each, combined with mpirun), where XXX is the GPU config for that node, so in my case, each node has 8 GPUs it’sexport CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7. Make sure you export this env var on each node, including the primary node.Downgrading to PL Lightning
1.7.7works for me. I don’t know what the cause of the problem is!@vacmar01 Was your PyTorch installed with GPU support? I suspect that it was not. Please check what
returns for you. If not, please install it like so:
pip3 install torch --extra-index-url https://download.pytorch.org/whl/cu116.