TensorRT: Cuda Error in allocate: 2 (out of memory) - GPU Memory Leak?

Description

I am running a simple benchmarking script that has a function that takes a Pytorch model, converts it to TensorRT (via ONNX), then runs inference on it multiple times and measures the time.

The function is called from main inside a loop with different models.

What I’ve noticed is that after each model finishes running, the GPU isn’t cleared completely, until the point where I start getting the following errors:

[TensorRT] WARNING: /usr/src/tensorrt/onnx-tensorrt/onnx2trt_utils.cpp:232: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] WARNING: /usr/src/tensorrt/onnx-tensorrt/onnx2trt_utils.cpp:232: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[TensorRT] ERROR: ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node Conv_3 + Relu_5.)
[TensorRT] ERROR: ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node Conv_3 + Relu_5.)
Exception: 'NoneType' object has no attribute 'serialize' - Couldnt convert model to TRT

The conversion and inference is run using code based on @rmccorm4 's GitHub repo with dynamic batching (and max_workspace_size = 2 << 30).

When I convert only a single model, there is never a problem, which leads me to believe that the GPU isn’t being cleared at the end of each conversion.

Environment

TensorRT Version: 7.1.3.4 GPU Type: Ti2080 Nvidia Driver Version: 450.66 CUDA Version: 11.0 CUDNN Version: 8.0.3 Operating System + Version: Ubuntu 18.04.5 LTS Python Version (if applicable): 3.7.5 PyTorch Version (if applicable): 1.6

TensorRT: Cuda Error in allocate: 2 (out of memory) - GPU Memory Leak?

Description

Environment

About this issue

Most upvoted comments