onnxruntime: Python: Inconsistent error behavior when creating sessions for different providers
Describe the bug This issue concerns python wheels with support for multiple providers. When creating an inference session, a list of providers can be provided. The behavior for different providers is not consistent:
- If the provider can be loaded without issue, the session is created as expected.
- If the CUDA provider fails, a low level error message is printed and CPU is used as fallback (although not in provider list)
- If the TensorRT provider fails, a low level error message is printed and python crashes with segfault
- If the OpenVINO provider fails, a low level error message is printed and python crashes with segfault
Urgency Medium
System information
- OS Platform and Distribution: Debian
- ONNX Runtime installed from: Source
- ONNX Runtime version: 1.9.1
- Python version: 3.8
- Visual Studio version (if applicable): -
- GCC/Compiler version (if compiling from source): 9.3
- CUDA/cuDNN version: 11.4.0/8.2.2.26
- GPU model and memory: V100@32GB
- OpenVINO: 2021.4.1
- TensorRT: 8.0.1.6
To Reproduce
- Build onnxruntime from tag v1.9.1 for Linux x64 with CUDA, TensorRT and OpenVino provider
- Install onnxruntime in an environment without CUDA, TensorRT and OpenVino available
- Create session in Python:
sess = ort.InferenceSession("model.onnx", providers=[provider])where provider = CUDAExecutionProvider/TensorrtExecutionProvider/OpenVINOExecutionProvider
Expected behavior A Python warning should be printed, if first provider in list failed. Subsequently the next provider should be used. If no provider in list is usable, an exception should be raised. No auto fallback to CPU should be used if not explicitly requested.
Additional context
Error messages for different providers:
2021-11-12 20:21:14.665899104 [E:onnxruntime:Default, provider_bridge_ort.cc:940 Get] Failed to load library libonnxruntime_providers_cuda.so with error: libcublas.so.11: cannot open shared object file: No such file or directory build session
2021-11-12 20:45:13.119746126 [E:onnxruntime:Default, provider_bridge_ort.cc:940 Get] Failed to load library libonnxruntime_providers_tensorrt.so with error: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-12 20:58:23.981654616 [E:onnxruntime:Default, provider_bridge_ort.cc:940 Get] Failed to load library libonnxruntime_providers_openvino.so with error: libonnx_importer.so: cannot open shared object file: No such file or directory
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 16 (14 by maintainers)
thanks. @chilo-ms has been investigating and identified the source of the segfault behavior. we will update it to be consistent with the CUDA EP behavior.
We tested the behavior for different configurations on a Raspberry Pi 4 with an additional Intel Movidius VPU stick. Depending on the situation, four different outcomes are possible (success, fallback, exception and segfault). The matter of a fallback could be deemed acceptable since
sess.get_providers()can be used to evaluated if the desired provider was loaded successfully or not. As already stated the segfault behavior should be avoided. It is a bit peculiar that both exceptions and fallbacks can occur in case of a provider error. This should probably be unified (either one or the other).