onnxruntime: Python: Inconsistent error behavior when creating sessions for different providers

Describe the bug This issue concerns python wheels with support for multiple providers. When creating an inference session, a list of providers can be provided. The behavior for different providers is not consistent:

If the provider can be loaded without issue, the session is created as expected.
If the CUDA provider fails, a low level error message is printed and CPU is used as fallback (although not in provider list)
If the TensorRT provider fails, a low level error message is printed and python crashes with segfault
If the OpenVINO provider fails, a low level error message is printed and python crashes with segfault

Urgency Medium

System information

OS Platform and Distribution: Debian
ONNX Runtime installed from: Source
ONNX Runtime version: 1.9.1
Python version: 3.8
Visual Studio version (if applicable): -
GCC/Compiler version (if compiling from source): 9.3
CUDA/cuDNN version: 11.4.0/8.2.2.26
GPU model and memory: V100@32GB
OpenVINO: 2021.4.1
TensorRT: 8.0.1.6

To Reproduce

Build onnxruntime from tag v1.9.1 for Linux x64 with CUDA, TensorRT and OpenVino provider
Install onnxruntime in an environment without CUDA, TensorRT and OpenVino available
Create session in Python: sess = ort.InferenceSession("model.onnx", providers=[provider]) where provider = CUDAExecutionProvider/TensorrtExecutionProvider/OpenVINOExecutionProvider

Expected behavior A Python warning should be printed, if first provider in list failed. Subsequently the next provider should be used. If no provider in list is usable, an exception should be raised. No auto fallback to CPU should be used if not explicitly requested.

Additional context Error messages for different providers: 2021-11-12 20:21:14.665899104 [E:onnxruntime:Default, provider_bridge_ort.cc:940 Get] Failed to load library libonnxruntime_providers_cuda.so with error: libcublas.so.11: cannot open shared object file: No such file or directory build session 2021-11-12 20:45:13.119746126 [E:onnxruntime:Default, provider_bridge_ort.cc:940 Get] Failed to load library libonnxruntime_providers_tensorrt.so with error: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-11-12 20:58:23.981654616 [E:onnxruntime:Default, provider_bridge_ort.cc:940 Get] Failed to load library libonnxruntime_providers_openvino.so with error: libonnx_importer.so: cannot open shared object file: No such file or directory

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 2
Comments: 16 (14 by maintainers)

Most upvoted comments

thanks. @chilo-ms has been investigating and identified the source of the segfault behavior. we will update it to be consistent with the CUDA EP behavior.

jywu-msft on Nov 19, 2021

We tested the behavior for different configurations on a Raspberry Pi 4 with an additional Intel Movidius VPU stick. Depending on the situation, four different outcomes are possible (success, fallback, exception and segfault). The matter of a fallback could be deemed acceptable since sess.get_providers() can be used to evaluated if the desired provider was loaded successfully or not. As already stated the segfault behavior should be avoided. It is a bit peculiar that both exceptions and fallbacks can occur in case of a provider error. This should probably be unified (either one or the other).

Provider List	Scenario	Behavior	Note
CUDA, OpenVINO, CPU	VPU plugged in, ENV initialized	Fallback	CUDA lib warning
TensorRT, OpenVINO, CPU	VPU plugged in, ENV initialized	Segfault	TensorRT lib missing
OpenVINO, CPU	VPU not plugged in, ENV initialized	Python Exception	NC_ERROR
OpenVINO, CPU	VPU plugged in, ENV not initialized	Segfault	OpenVINO lib missing

paradigmn on Nov 18, 2021