onnx-mlir: Segmentation fault when a model was compiled with -O0, -O1
I got segmentation fault with many models if they were compiled using -O0 or -O1, e.g. with resnet50:
ONNX_MLIR_HOME=/home/tungld/dl/onnx-mlir/build/Debug python ../utils/RunONNXModel.py resnet50.onnx --compile_args="-mcpu=z14 -O0"
Temporary directory has been created at /tmp/tmpjf_k8lqx
Generating random inputs ...
- 1st input's shape (1, 3, 224, 224)
done.
Compiling the model ...
Shared library /tmp/tmpjf_k8lqx/model.so has been compiled.
took 57.782129378058016 seconds.
Running inference ...
Segmentation fault (core dumped)
There was no issue when using -O2 or -O3.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 24
Verified that resnet50-v1-7.onnx can now compile and run successfully at
-O0,-O1,-O2, and-O3on LoP. @tungld Please reopen if there are other models that still need to be looked at.@tungld Thanks for the suggestion! I found a way to avoid the problem by changing how the
allocainstructions are created in the first place.I have narrowed down the runtime problem at
(When I move the alloca to the beginning of the function, the model can run successfully.)
-O0to an alloc and its users inside a multiple levels nested loop nest, which can cause segfault when run out of stack space. At-O1and above,InstCombineinoptis able to remove such alloc and all their users, as they are actually dead code.