TensorRT: Cuda Error in allocate: 2 (out of memory) - GPU Memory Leak?
Description
I am running a simple benchmarking script that has a function that takes a Pytorch model, converts it to TensorRT (via ONNX), then runs inference on it multiple times and measures the time.
The function is called from main inside a loop with different models.
What I’ve noticed is that after each model finishes running, the GPU isn’t cleared completely, until the point where I start getting the following errors:
[TensorRT] WARNING: /usr/src/tensorrt/onnx-tensorrt/onnx2trt_utils.cpp:232: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] WARNING: /usr/src/tensorrt/onnx-tensorrt/onnx2trt_utils.cpp:232: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: Conv_3 + Relu_5
[TensorRT] ERROR: Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[TensorRT] ERROR: ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node Conv_3 + Relu_5.)
[TensorRT] ERROR: ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node Conv_3 + Relu_5.)
Exception: 'NoneType' object has no attribute 'serialize' - Couldnt convert model to TRT
The conversion and inference is run using code based on @rmccorm4 's GitHub repo with dynamic batching (and max_workspace_size = 2 << 30).
When I convert only a single model, there is never a problem, which leads me to believe that the GPU isn’t being cleared at the end of each conversion.
Environment
TensorRT Version: 7.1.3.4 GPU Type: Ti2080 Nvidia Driver Version: 450.66 CUDA Version: 11.0 CUDNN Version: 8.0.3 Operating System + Version: Ubuntu 18.04.5 LTS Python Version (if applicable): 3.7.5 PyTorch Version (if applicable): 1.6
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16
OK, I will try, thank for you help
Thanks, I will try and reply to the result later.