onnxruntime: System memory leak on cuda GPU backend.

Describe the bug System memory keeps increasing while using the CUDA GPU backend.

Urgency very urgent

System information

OS Platform and Distribution : Linux Ubuntu 16.04
ONNX Runtime installed from (source or binary): pip install onnxruntime-gpu==1.8
ONNX Runtime version: 1.8
Python version: 3.7.10
Visual Studio version (if applicable): No
GCC/Compiler version (if compiling from source): -
CUDA/cuDNN version: 11.1
GPU model and memory: A30, 24GB

To Reproduce

Please download the detection model from https://1drv.ms/u/s!AswpsDO2toNKsTYUYsyy9kdSZSfe?e=KPHWCL (onedrive link) And then use the following code to test:

import numpy as np                                                                                                                                            
import onnxruntime                                                                                                                                            
import cv2                                                                                                                                                    
                                                                                                                                                              
model_file = 'scrfd_10g_bnkps.onnx'                                                                                                                                    
session = onnxruntime.InferenceSession(model_file, None)                                                                                                      
input_cfg = session.get_inputs()[0]                                                                                                                           
input_shape = input_cfg.shape                                                                                                                                 
input_name = input_cfg.name                                                                                                                                   
outputs = session.get_outputs()                                                                                                                               
output_names = []                                                                                                                                             
for o in outputs:                                                                                                                                             
    output_names.append(o.name)                                                                                                                               
img = np.random.randint(0, 255, size=(640,640,3), dtype=np.uint8)                                                                                             
input_std = 128.0                                                                                                                                             
input_mean = 127.5                                                                                                                                            
blob = cv2.dnn.blobFromImage(img, 1.0/input_std, (640, 640), (input_mean, input_mean, input_mean), swapRB=True)                                               
for _ in range(1000000):                                                                                                                                      
    net_outs = session.run(output_names, {input_name : blob})                                                                                                 
    pred = net_outs[0]

The leak is happening at pred = net_outs[0]. If we omit this line, there’s no memory leak. Also,

If we use CPU backend by setting session.set_providers(['CPUExecutionProvider']), no memory leak.
If we use cuda10.2 and onnxruntime-gpu==1.6, no memory leak.

Expected behavior system memory cost is stable

Screenshots

Additional context

About this issue

Original URL
State: open
Created 3 years ago
Comments: 15 (5 by maintainers)

Most upvoted comments

I too can confirm this. I’m using a detection model with some ResNet backbone. My environment is Windows Server 2016. Here are the CUDA versions used:

ONNX Runtime 1.9.0 CUDA EP - CUDA 11.1.1+cuDNN 8.0.5.39 == Memory Leak ONNX Runtime 1.9.0 CUDA EP - CUDA 11.4.3+cuDNN 8.2.4.15 == No Memory Leak

CarlPoirier on Nov 24, 2021

FYI - Ran into this same memory issue when running CUDA inference for a set of deep learning models (RetinaFace+ArcFace+Age Estimation+Custom Classifier+YoloV4) with ORT 1.8.1 (C# API). The memory leak takes a number of runs before it really starts to rear it’s head. Originally went down the rabbit hole of tracing any memory leaks from items not being disposed, but it turns out it was the ORT+CUDA version combination (issue does not occur when running the CPU EP). Upgraded CUDA to 11.4.1 and the latest cuDNN version, and life appears to be good now.

TLDR; ONNX Runtime 1.8.1 CPU EP == No Memory Leak ONNX Runtime 1.8.1 CUDA EP - CUDA 11.1+cuDNN 8.0.4.30 == Memory Leak ONNX Runtime 1.8.1 CUDA EP - CUDA 11.4.1+cuDNN 8.2.2.26 == No Memory Leak

I have not tested extensively on any other CUDA or ORT builds. Hope this helps and saves someone time in the future!

mrjarhead on Aug 18, 2021

And you are feeding a numpy array returned from cv2.dnn.blobFromImage()?

@yuslepukhin Yes. It’s not the key problem, I believe. Ver 1.6 works well for this piece of code.

nttstar on Jun 28, 2021