onnxruntime: Segmentation fault when running onnxruntime inside docker with cpuset restrictions

Describe the bug A clear and concise description of what the bug is.

Onnxruntime crashes when I run it inside Docker with CPU limitations specified by “cpuset-cpus”. The crash doesn’t happen when running Docker without “cpuset-cpus” arg, or running Docker with “cpuset-cpus” with a lot of CPU cores.

Urgency If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • ONNX Runtime installed from (source or binary): pip
  • ONNX Runtime version: 1.7.0
  • Python version: 3.8
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

To Reproduce

  • Describe steps/code to reproduce the behavior.
  • Attach the ONNX model to the issue (where applicable) to expedite investigation.

Hardware: 32 core AMD CPU (64 threads). 4x 2080Ti GPUs

The crash doesn’t happen when I provision many cores, such as “–cpuset-cpus 0-31”.

docker run --rm -it --gpus all --cpuset-cpus 0-15 nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04

then, inside docker container

apt update
apt install python3-pip wget
pip3 install onnxruntime
wget https://github.com/onnx/models/blob/master/vision/classification/mnist/model/mnist-7.onnx?raw=true -O mnist.onnx
python3

then, inside python3:

import onnxruntime as ort
ort.InferenceSession('mnist.onnx') # crash!

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 15 (3 by maintainers)

Most upvoted comments

Seen this problem as well. A solution that worked for me was to set the number of intra_op_num_threads to something corresponding to the number of available cores:

import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = 8
sess = ort.InferenceSession('some_model.onnx', sess_options=sess_options)

there’s no stack trace – just a one-liner message saying core dumped.