onnxruntime: onnxruntime.InferenceSession hangs
Describe the issue
Hi! I have exported an onnx model from pytorch and I am trying to use it for inference, but the onnxruntime.InferenceSession hangs without any error I have tried this on linux and mac. When using strace, it does not show anything after the last message
The model can be found here: model
To reproduce
import onnxruntime as ort
ort.set_default_logger_severity(0)
so = ort.SessionOptions()
print(so.inter_op_num_threads)
print(so.intra_op_num_threads)
print("starting to load")
ort_session = ort.InferenceSession(
"model_1.onnx",
providers=["CPUExecutionProvider"],
)
print("finished loading")
The log gives this:
0
0
starting to load
2024-04-17 14:39:53.167254537 [I:onnxruntime:, inference_session.cc:330 operator()] Flush-to-zero and denormal-as-zero are off
2024-04-17 14:39:53.244762912 [I:onnxruntime:, inference_session.cc:338 ConstructorCommon] Creating and using per session threadpools since use_per_session_threads_ is true
2024-04-17 14:39:53.244791816 [I:onnxruntime:, inference_session.cc:356 ConstructorCommon] Dynamic block base set to 0
2024-04-17 14:39:53.245374968 [V:onnxruntime:Default, env.cc:248 ThreadMain] pthread_setaffinity_np succeed for thread: 1325925, index: 0, mask: {1, }
2024-04-17 14:39:53.245475139 [V:onnxruntime:Default, env.cc:248 ThreadMain] pthread_setaffinity_np succeed for thread: 1325927, index: 2, mask: {3, }
2024-04-17 14:39:53.245640304 [V:onnxruntime:Default, env.cc:248 ThreadMain] pthread_setaffinity_np succeed for thread: 1325930, index: 5, mask: {6, }
2024-04-17 14:39:53.257577164 [V:onnxruntime:Default, env.cc:248 ThreadMain] pthread_setaffinity_np succeed for thread: 1325926, index: 1, mask: {2, }
2024-04-17 14:39:53.257664768 [V:onnxruntime:Default, env.cc:248 ThreadMain] pthread_setaffinity_np succeed for thread: 1325928, index: 3, mask: {4, }
2024-04-17 14:39:53.257809630 [V:onnxruntime:Default, env.cc:248 ThreadMain] pthread_setaffinity_np succeed for thread: 1325929, index: 4, mask: {5, }
2024-04-17 14:39:53.260755085 [V:onnxruntime:Default, env.cc:248 ThreadMain] pthread_setaffinity_np succeed for thread: 1325932, index: 7, mask: {8, }
2024-04-17 14:39:53.260733432 [V:onnxruntime:Default, env.cc:248 ThreadMain] pthread_setaffinity_np succeed for thread: 1325931, index: 6, mask: {7, }
2024-04-17 14:39:53.264674323 [V:onnxruntime:Default, env.cc:248 ThreadMain] pthread_setaffinity_np succeed for thread: 1325933, index: 8, mask: {9, }
2024-04-17 14:40:02.464322875 [I:onnxruntime:, inference_session.cc:1402 Initialize] Initializing session.
2024-04-17 14:40:02.464374810 [I:onnxruntime:Default, bfc_arena.cc:29 BFCArena] Creating BFCArena for Cpu with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0
2024-04-17 14:40:02.495584676 [V:onnxruntime:Default, bfc_arena.cc:66 BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2024-04-17 14:40:08.501566408 [I:onnxruntime:, constant_sharing.cc:256 ApplyImpl] Total shared scalar initializer count: 7799
2024-04-17 14:40:23.650234892 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer '_val_149'. It is no longer used by any node.
2024-04-17 14:40:23.650378373 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer '_val_142'. It is no longer used by any node.
2024-04-17 14:40:23.650385274 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer '_val_139'. It is no longer used by any node.
2024-04-17 14:40:23.650391115 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer '_val_137'. It is no longer used by any node.
2024-04-17 14:40:23.650403024 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer '_val_144'. It is no longer used by any node.
2024-04-17 14:40:23.650410618 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer '_val_146'. It is no longer used by any node.
2024-04-17 14:40:23.650428728 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer 'ortshared_7_0_1_7'. It is no longer used by any node.
2024-04-17 14:40:23.650434231 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer 'ortshared_7_0_1_6'. It is no longer used by any node.
2024-04-17 14:40:23.650441269 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer 'ortshared_7_0_1_4'. It is no longer used by any node.
2024-04-17 14:40:23.650446846 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer 'ortshared_7_0_1_3'. It is no longer used by any node.
2024-04-17 14:40:23.650456481 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer 'ortshared_7_1_1_1'. It is no longer used by any node.
2024-04-17 14:40:23.650463897 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer 'ortshared_7_0_1_1'. It is no longer used by any node.
2024-04-17 14:40:23.650473228 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer 'ortshared_7_0_1_0'. It is no longer used by any node.
2024-04-17 14:40:23.652591674 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer '_val_148'. It is no longer used by any node.
2024-04-17 14:40:23.654060774 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer 'ortshared_7_0_1_5'. It is no longer used by any node.
2024-04-17 14:40:23.654923282 [I:onnxruntime:, graph.cc:3556 CleanUnusedInitializersAndNodeArgs] Removing initializer 'ortshared_7_0_1_2'. It is no longer used by any node.
2024-04-17 14:40:27.459510017 [I:onnxruntime:, constant_sharing.cc:256 ApplyImpl] Total shared scalar initializer count: 2
Urgency
This is a key step to use our model in production
Platform
Linux
OS Version
Linux 9.3
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.3
ONNX Runtime API
Python
Architecture
X86_64
Execution Provider
Default CPU
Execution Provider Library Version
No response
About this issue
- Original URL
- State: closed
- Created 2 months ago
- Comments: 17 (9 by maintainers)
The graph had many constant that were created by the model inside functions, I initialized those with the model instead. Also there were some conversion errors like:
x[...,index_list]is not converted well and has to be modified to use torch.index_select. However, operations like einsum do not seem to be dynamic with input shape (this is for a GNN like architecture) so that is problematic.Hi @justinchuby, I am wondering if this could be related to the intro of new transformations 124160. Do you think it could be the case? (sorry to bother you again)