onnxruntime: Python: Inconsistent error behavior when creating sessions for different providers

Describe the bug This issue concerns python wheels with support for multiple providers. When creating an inference session, a list of providers can be provided. The behavior for different providers is not consistent:

  • If the provider can be loaded without issue, the session is created as expected.
  • If the CUDA provider fails, a low level error message is printed and CPU is used as fallback (although not in provider list)
  • If the TensorRT provider fails, a low level error message is printed and python crashes with segfault
  • If the OpenVINO provider fails, a low level error message is printed and python crashes with segfault

Urgency Medium

System information

  • OS Platform and Distribution: Debian
  • ONNX Runtime installed from: Source
  • ONNX Runtime version: 1.9.1
  • Python version: 3.8
  • Visual Studio version (if applicable): -
  • GCC/Compiler version (if compiling from source): 9.3
  • CUDA/cuDNN version: 11.4.0/8.2.2.26
  • GPU model and memory: V100@32GB
  • OpenVINO: 2021.4.1
  • TensorRT: 8.0.1.6

To Reproduce

  • Build onnxruntime from tag v1.9.1 for Linux x64 with CUDA, TensorRT and OpenVino provider
  • Install onnxruntime in an environment without CUDA, TensorRT and OpenVino available
  • Create session in Python: sess = ort.InferenceSession("model.onnx", providers=[provider]) where provider = CUDAExecutionProvider/TensorrtExecutionProvider/OpenVINOExecutionProvider

Expected behavior A Python warning should be printed, if first provider in list failed. Subsequently the next provider should be used. If no provider in list is usable, an exception should be raised. No auto fallback to CPU should be used if not explicitly requested.

Additional context Error messages for different providers: 2021-11-12 20:21:14.665899104 [E:onnxruntime:Default, provider_bridge_ort.cc:940 Get] Failed to load library libonnxruntime_providers_cuda.so with error: libcublas.so.11: cannot open shared object file: No such file or directory build session 2021-11-12 20:45:13.119746126 [E:onnxruntime:Default, provider_bridge_ort.cc:940 Get] Failed to load library libonnxruntime_providers_tensorrt.so with error: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-11-12 20:58:23.981654616 [E:onnxruntime:Default, provider_bridge_ort.cc:940 Get] Failed to load library libonnxruntime_providers_openvino.so with error: libonnx_importer.so: cannot open shared object file: No such file or directory

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 16 (14 by maintainers)

Most upvoted comments

thanks. @chilo-ms has been investigating and identified the source of the segfault behavior. we will update it to be consistent with the CUDA EP behavior.

We tested the behavior for different configurations on a Raspberry Pi 4 with an additional Intel Movidius VPU stick. Depending on the situation, four different outcomes are possible (success, fallback, exception and segfault). The matter of a fallback could be deemed acceptable since sess.get_providers() can be used to evaluated if the desired provider was loaded successfully or not. As already stated the segfault behavior should be avoided. It is a bit peculiar that both exceptions and fallbacks can occur in case of a provider error. This should probably be unified (either one or the other).

Provider List Scenario Behavior Note
CUDA, OpenVINO, CPU VPU plugged in, ENV initialized Fallback CUDA lib warning
TensorRT, OpenVINO, CPU VPU plugged in, ENV initialized Segfault TensorRT lib missing
OpenVINO, CPU VPU not plugged in, ENV initialized Python Exception NC_ERROR
OpenVINO, CPU VPU plugged in, ENV not initialized Segfault OpenVINO lib missing