transformer-deploy: Out of memeory error for batch size more than 1 for T5 models.
hey, first of all, thanks for creating this amazing library!
I’m following your T5 implementation with trt, https://github.com/ELS-RD/transformer-deploy/blob/b52850dce004212225edcaa7b80fccc311398038/t5.py#L222
And, I’m trying to convert the onnx version of the T5 model to tensorrt engine using your build_engine method,
https://github.com/ELS-RD/transformer-deploy/blob/1f2d2c1d8d0239fca7679f8c550a954ea1445cfa/src/transformer_deploy/backends/trt_utils.py#L64
It works fine for a batch size of 1, but for batch size > 1. it’s taking longer to build (almost an hour just for the t5-small encoder), and even after that it’s not building the model successfully and getting the following error :
[03/18/2022-12:51:55] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::161] Error Code 2: OutOfMemory (no further information)
[03/18/2022-12:51:55] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::161] Error Code 2: OutOfMemory (no further information)
[03/18/2022-12:51:55] [TRT] [E] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[encoder.embed_tokens.weight...Mul_406]}.)
[03/18/2022-12:51:55] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
Traceback (most recent call last):
File "export_onnx_to_trt.py", line 100, in <module>
build_t5_engine(onnx_encoder_path, trt_encoder_path, [input_id_shape])
File "export_onnx_to_trt.py", line 86, in build_t5_engine
engine: ICudaEngine = build_engine(
File "/app/utils.py", line 209, in build_engine
engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:
1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine
Invoked with: <tensorrt.tensorrt.Runtime object at 0x7f380bbf8930>, None
some system info if that helps;
trt+cuda - 8.2.1-1+cuda11.4os - ubuntu 20.04.3gpu - T4 with 15GB memory
the errors say I need more GPU memory, I was wondering how much GPU memory did you use for a batch size of 5? or maybe I’m missing something?
I would really appreciate any help, thank you!
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 1
- Comments: 15 (9 by maintainers)
T5 work requires a good support of the
IfOnnx node, which has been recently added to Onnx Runtime (onlymasterbranch). Triton support will be added when Onnx Runtime 1.12 (somewhere in June) and Triton with Onnx Runtime 1.12 engine will be released.