onnxruntime: Can't load Cuda Provider on Linux due symbol lookup error
Describe the bug I am trying to load OnnxRuntime library with CUDA provider in C# application but get symbol lookup error:
dotnet: symbol lookup error: /home/egortech/TestOnnx/net5.0/runtimes/linux-x64/native/libonnxruntime_providers_cuda.so: undefined symbol: Provider_GetHost
Urgency If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04.3 LTS
- ONNX Runtime installed from (source or binary): binary but trying built from source as well
- ONNX Runtime version: 1.10
- Python version: Not Applicable
- Visual Studio version (if applicable): 2019
- GCC/Compiler version (if compiling from source): 9.3.0
- CUDA/cuDNN version: 11.4/8.2.0.6
- GPU model and memory: NVidia GeForce 2070 8GB
To Reproduce I am trying to preload Onnx Runtime Libs in c# with that code:
NativeLibrary.Load(
"onnxruntime.so",
Assembly.GetEntryAssembly(),
DllImportSearchPath.UseDllDirectoryForDependencies);
NativeLibrary.Load(
"onnxruntime_providers_shared.so",
Assembly.GetEntryAssembly(),
DllImportSearchPath.UseDllDirectoryForDependencies);
NativeLibrary.Load(
"onnxruntime_providers_cuda.so",
Assembly.GetEntryAssembly(),
DllImportSearchPath.UseDllDirectoryForDependencies);
Additional context
I decided to investigate elf binary and saw that function Provider_GetHost links nowhere:

libonnxruntime_providers_cuda.so: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
...
0000000000000000 DF *UND* 0000000000000000 libcudnn.so.8 cudnnCreate
0000000000000000 DF *UND* 0000000000000000 GLIBCXX_3.4.21 _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6insertEmPKc
0000000000000000 DF *UND* 0000000000000000 GLIBCXX_3.4.15 _ZNSt8__detail15_List_node_base11_M_transferEPS0_S1_
0000000000000000 D *UND* 0000000000000000 Provider_GetHost
0000000000000000 DF *UND* 0000000000000000 libcudnn.so.8 cudnnFindConvolutionBackwardDataAlgorithmEx
0000000000000000 DF *UND* 0000000000000000 libcudart.so.11.0 cudaGetDeviceProperties
0000000000000000 DF *UND* 0000000000000000 libcudnn.so.8 cudnnSetActivationDescriptor
...
Patching elf to add a reference to onnxruntime_providers_shared.so (which i think this function placed) gave me only segmentation error:
1246229: symbol=Provider_GetHost; lookup in file=dotnet [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/libpthread.so.0 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/libdl.so.2 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/libstdc++.so.6 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/libm.so.6 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/libgcc_s.so.1 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/libc.so.6 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/liblttng-ust-tracepoint.so.0 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/liburcu-bp.so.6 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/liburcu-cds.so.6 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/liburcu-common.so.6 [0]
1246229: symbol=Provider_GetHost; lookup in file=/usr/lib64/dotnet/shared/Microsoft.NETCore.App/5.0.9/libcoreclrtraceptprovider.so [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/liblttng-ust.so.0 [0]
1246229: symbol=Provider_GetHost; lookup in file=/lib64/librt.so.1 [0]
1246229: symbol=Provider_GetHost; lookup in file=/home/egortech/TestOnnx/net5.0/runtimes/linux-x64/native/libonnxruntime_providers_cuda.so [0]
1246229: symbol=Provider_GetHost; lookup in file=/home/egortech/TestOnnx/net5.0/runtimes/linux-x64/native/libonnxruntime_providers_shared.so [0]
1246229: binding file /home/egortech/TestOnnx/net5.0/runtimes/linux-x64/native/libonnxruntime_providers_cuda.so [0] to /home/egortech/TestOnnx/net5.0/runtimes/linux-x64/native/libonnxruntime_providers_shared.so [0]: normal symbol `Provider_GetHost'
Segmentation fault (core dumped)
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 16 (5 by maintainers)
As from the stackoverflow link, the point might actually be how RPATH propagation works in the context of a compiled extension: RPATH propagation only works when RPATH is set on the executable. For python extensions, the RPATH is only set in the extension itself (Python executables generally come without any RPATH or RUNPATH set, at least in all distributions I know). This is true for both onnxruntime python bindings and my own. The extension is a library, not an executable, so the RPATH will only be used to dlopen the direct dependencies. For CUDA libraries to be loaded, one thus needs to set the RPATH of onnxruntime_providers_shared (I believe, didn’t check the dependency tree).
I will try to verify if this is actually true in practice, I just need to find the time, and will try to prepare a PR if it works.
Thanks @RyanUnderhill , that helped to me understand and resolve the underlying issue. Maybe there is a way to highlight that as a solution .
@andrea-cimatoribus-pix4d , the problem is that i was loading the library
libonnxruntime_providers_cuda.soand it complained with unresolved symbol “Provider_GetHost”. Provider_GetHost is defined inlibonnxruntime_providers_shared.soHowever,libonnxruntime_providers_cuda.sodoesn’t have dependency onlibonnxruntime_providers_shared.sowhich could be a bug if directly loading this library is intended workflow However, the intended workflow as suggested is to loadlibonnxruntime.soand append CUDAproviders options so thatlibonnxruntime.soloads all the required includinglibonnxruntime_providers_cuda.soandlibonnxruntime_providers_shared.so@parajav See my reply above: https://github.com/microsoft/onnxruntime/issues/9309#issuecomment-940606615
You shouldn’t be loading
libonnxruntime_providers_cuda.soOnnxruntime loads this as described above.Hmm, thank you for debugging further. The process to load it on Linux goes like this:
onnxruntime.so dynamically loads (dlopen) onnxruntime_providers_shared.so with RTLD_GLOBAL Then onnxruntime.so dynamically loads onnxruntime_providers_cuda.so (with RTLD_LOCAL). (On Windows there’s no global/local stuff, this is Linux specific).
The RTLD_GLOBAL should make it see Provider_GetHost from onnxruntime_providers_cuda.so. Can you tell if it’s getting preloaded somehow with the wrong setting?