onnxruntime: System memory leak on cuda GPU backend.
Describe the bug System memory keeps increasing while using the CUDA GPU backend.
Urgency very urgent
System information
- OS Platform and Distribution : Linux Ubuntu 16.04
- ONNX Runtime installed from (source or binary): pip install onnxruntime-gpu==1.8
- ONNX Runtime version: 1.8
- Python version: 3.7.10
- Visual Studio version (if applicable): No
- GCC/Compiler version (if compiling from source): -
- CUDA/cuDNN version: 11.1
- GPU model and memory: A30, 24GB
To Reproduce
Please download the detection model from https://1drv.ms/u/s!AswpsDO2toNKsTYUYsyy9kdSZSfe?e=KPHWCL (onedrive link) And then use the following code to test:
import numpy as np
import onnxruntime
import cv2
model_file = 'scrfd_10g_bnkps.onnx'
session = onnxruntime.InferenceSession(model_file, None)
input_cfg = session.get_inputs()[0]
input_shape = input_cfg.shape
input_name = input_cfg.name
outputs = session.get_outputs()
output_names = []
for o in outputs:
output_names.append(o.name)
img = np.random.randint(0, 255, size=(640,640,3), dtype=np.uint8)
input_std = 128.0
input_mean = 127.5
blob = cv2.dnn.blobFromImage(img, 1.0/input_std, (640, 640), (input_mean, input_mean, input_mean), swapRB=True)
for _ in range(1000000):
net_outs = session.run(output_names, {input_name : blob})
pred = net_outs[0]
The leak is happening at pred = net_outs[0]. If we omit this line, there’s no memory leak.
Also,
- If we use CPU backend by setting
session.set_providers(['CPUExecutionProvider']), no memory leak. - If we use cuda10.2 and onnxruntime-gpu==1.6, no memory leak.
Expected behavior system memory cost is stable
Screenshots
Additional context
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 15 (5 by maintainers)
I too can confirm this. I’m using a detection model with some ResNet backbone. My environment is Windows Server 2016. Here are the CUDA versions used:
ONNX Runtime 1.9.0 CUDA EP - CUDA 11.1.1+cuDNN 8.0.5.39 == Memory Leak ONNX Runtime 1.9.0 CUDA EP - CUDA 11.4.3+cuDNN 8.2.4.15 == No Memory Leak
FYI - Ran into this same memory issue when running CUDA inference for a set of deep learning models (RetinaFace+ArcFace+Age Estimation+Custom Classifier+YoloV4) with ORT 1.8.1 (C# API). The memory leak takes a number of runs before it really starts to rear it’s head. Originally went down the rabbit hole of tracing any memory leaks from items not being disposed, but it turns out it was the ORT+CUDA version combination (issue does not occur when running the CPU EP). Upgraded CUDA to 11.4.1 and the latest cuDNN version, and life appears to be good now.
TLDR; ONNX Runtime 1.8.1 CPU EP == No Memory Leak ONNX Runtime 1.8.1 CUDA EP - CUDA 11.1+cuDNN 8.0.4.30 == Memory Leak ONNX Runtime 1.8.1 CUDA EP - CUDA 11.4.1+cuDNN 8.2.2.26 == No Memory Leak
I have not tested extensively on any other CUDA or ORT builds. Hope this helps and saves someone time in the future!
@yuslepukhin Yes. It’s not the key problem, I believe. Ver 1.6 works well for this piece of code.