onnxruntime: [Performance] 1.14RC1 Tensorrt Regression

Describe the issue

I have been testing 1.14.0RC1 and am seeing quite a significant performance regression vs 1.13.1 using C api. You can see the use of the GPU is lower (both wattage and volatile ram) suggesting some bottlenecking has been introduced.

onnxruntime 1.14.0

real    2m48.415s
user    5m15.012s
sys     0m4.494s

|===============================+======================+======================|
|   0  NVIDIA A2           Off  | 00000000:51:00.0 Off |                    0 |
|  0%   65C    P0    57W /  60W |   1242MiB / 15356MiB |     90%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

onnxruntime 1.13.1

real    2m30.369s
user    4m46.815s
sys     0m5.421s

|===============================+======================+======================|
|   0  NVIDIA A2           Off  | 00000000:51:00.0 Off |                    0 |
|  0%   63C    P0    60W /  60W |   1010MiB / 15356MiB |     97%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

To reproduce

This process is doing three things in parallel:

  • copying data from cpu to gpu
  • executing a preprocessing model (resizing images)
  • executing a ml model against those images

Urgency

No response

Platform

Linux

OS Version

Ubuntu 20.04.5LTS

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

rel-1.14.0

ONNX Runtime API

C

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

CUDA 12.0

Model File

No response

Is this a quantized model?

No

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 29 (15 by maintainers)

Commits related to this issue

Most upvoted comments

@noujaimc I have been working with @souptc and @jslhcl to reproduce the issue. It looks like an issue relating to io_binding and they have created a PR a few minutes ago that hopefully will fix the issue.

Thank you very much @souptc and @jslhcl for your excellent support.