transformer-deploy: Out of memeory error for batch size more than 1 for T5 models.

hey, first of all, thanks for creating this amazing library!

I’m following your T5 implementation with trt, https://github.com/ELS-RD/transformer-deploy/blob/b52850dce004212225edcaa7b80fccc311398038/t5.py#L222

And, I’m trying to convert the onnx version of the T5 model to tensorrt engine using your build_engine method, https://github.com/ELS-RD/transformer-deploy/blob/1f2d2c1d8d0239fca7679f8c550a954ea1445cfa/src/transformer_deploy/backends/trt_utils.py#L64

It works fine for a batch size of 1, but for batch size > 1. it’s taking longer to build (almost an hour just for the t5-small encoder), and even after that it’s not building the model successfully and getting the following error :

[03/18/2022-12:51:55] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::161] Error Code 2: OutOfMemory (no further information)
[03/18/2022-12:51:55] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::161] Error Code 2: OutOfMemory (no further information)
[03/18/2022-12:51:55] [TRT] [E] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[encoder.embed_tokens.weight...Mul_406]}.)
[03/18/2022-12:51:55] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
Traceback (most recent call last):
  File "export_onnx_to_trt.py", line 100, in <module>
    build_t5_engine(onnx_encoder_path, trt_encoder_path, [input_id_shape])
  File "export_onnx_to_trt.py", line 86, in build_t5_engine
    engine: ICudaEngine = build_engine(
  File "/app/utils.py", line 209, in build_engine
    engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine

Invoked with: <tensorrt.tensorrt.Runtime object at 0x7f380bbf8930>, None

some system info if that helps;

trt+cuda - 8.2.1-1+cuda11.4
os - ubuntu 20.04.3
gpu - T4 with 15GB memory

the errors say I need more GPU memory, I was wondering how much GPU memory did you use for a batch size of 5? or maybe I’m missing something?

I would really appreciate any help, thank you!

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 1
Comments: 15 (9 by maintainers)

Most upvoted comments

T5 work requires a good support of the If Onnx node, which has been recently added to Onnx Runtime (only master branch). Triton support will be added when Onnx Runtime 1.12 (somewhere in June) and Triton with Onnx Runtime 1.12 engine will be released.

pommedeterresautee on Jun 2, 2022