onnxruntime: Onnxruntime in WSL with CUDA is much slower than windows
Describe the bug
I’m running the windows 11 version of wsl with cuda enabled and the onnxruntime-gpu package. I’m running a straightforward batched image task on a small subset of all ~20k images I have.
The exact same code running the same versions of all packages (when possible) in WSL takes ~2m58s to complete, whereas on Windows it takes ~19s.
Urgency It feels urgent to me but I’m sure it’s pretty low on the list of priorities and I can make it work in the meantime by swapping this part of my workflow back to windows.
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 11, WSL Ubuntu 20.04
- ONNX Runtime installed from (source or binary): binary
- ONNX Runtime version: 1.9.0
- Python version: 3.9
- CUDA/cuDNN version:
- WSL:
NVIDIA-SMI 470.63.01 Driver Version: 471.68 CUDA Version: 11.4 - Windows:
NVIDIA-SMI 471.68 Driver Version: 471.68 CUDA Version: 11.4
- WSL:
- GPU model and memory: RTX 3090, 24gb
To Reproduce Run a basic model in wsl and run it on windows, observe the difference.
Expected behavior I’d expect WSL to at least be in the ballpark of windows performance. I know there’s some overhead but this seems execcisve.
Additional context This is obviously only possible with the release preview of windows since it enables CUDA in WSL, so I expect this bug was easy to miss. Additionally, the GPU utilization when I run my kernels this way is extremely low, but not zero – so it is using the GPU (I think) – just not well. Contrast with windows where I can max out the utilization and memory.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (6 by maintainers)
Here is what I see on WSL: