onnxruntime: Onnxruntime in WSL with CUDA is much slower than windows

Describe the bug I’m running the windows 11 version of wsl with cuda enabled and the onnxruntime-gpu package. I’m running a straightforward batched image task on a small subset of all ~20k images I have.

The exact same code running the same versions of all packages (when possible) in WSL takes ~2m58s to complete, whereas on Windows it takes ~19s.

Urgency It feels urgent to me but I’m sure it’s pretty low on the list of priorities and I can make it work in the meantime by swapping this part of my workflow back to windows.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 11, WSL Ubuntu 20.04
  • ONNX Runtime installed from (source or binary): binary
  • ONNX Runtime version: 1.9.0
  • Python version: 3.9
  • CUDA/cuDNN version:
    • WSL: NVIDIA-SMI 470.63.01 Driver Version: 471.68 CUDA Version: 11.4
    • Windows: NVIDIA-SMI 471.68 Driver Version: 471.68 CUDA Version: 11.4
  • GPU model and memory: RTX 3090, 24gb

To Reproduce Run a basic model in wsl and run it on windows, observe the difference.

Expected behavior I’d expect WSL to at least be in the ballpark of windows performance. I know there’s some overhead but this seems execcisve.

Additional context This is obviously only possible with the release preview of windows since it enables CUDA in WSL, so I expect this bug was easy to miss. Additionally, the GPU utilization when I run my kernels this way is extremely low, but not zero – so it is using the GPU (I think) – just not well. Contrast with windows where I can max out the utilization and memory.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Here is what I see on WSL:

Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import onnxruntime
>>> onnxruntime.__version__
'1.9.0'
>>> onnxruntime.cuda_version
''
>>> onnxruntime.get_device()
'GPU'
>>> onnxruntime.get_available_providers()
['CUDAExecutionProvider', 'CPUExecutionProvider']