mmdeploy: [Bug] mmaction2 recognition task inference speed is slow
Checklist
- I have searched related issues but cannot get the expected help.
- 2. I have read the FAQ documentation but cannot get the expected help.
- 3. The bug has not been fixed in the latest version.
Describe the bug
After convert CSN model to trt format, the inference time is much slower than origin pytorch format.
Reproduction
I tried to convert the CSM model to trt format with follow deplo config,
base = [‘./video-recognition_static.py’]
onnx_config = dict( type=‘onnx’, export_params=True, keep_initializers_as_inputs=False, opset_version=11, save_file=‘end2end.onnx’, input_names=[‘input’], output_names=[‘output’], input_shape=[256, 256], optimize=True, dynamic_axes=dict( input=dict({ 1: ‘num_crops * num_segs’, }), )) codebase_config = dict(type=‘mmaction’, task=‘VideoRecognition’) backend_config = dict( type=‘tensorrt’, common_config=dict(fp16_mode=False, max_workspace_size=99073741824), model_inputs=[ dict( input_shapes=dict( input=dict( min_shape=[1, 1, 3, 32, 256, 256], opt_shape=[1, 6, 3, 32, 256, 256], max_shape=[1, 30, 3, 32, 256, 256]))) ])
I think the convert was successful, for it didn’t report any issue. But the speed is very slow. How can I location the problem? Is the problem able to be solved?
Environment
03/28 21:53:33 - mmengine - INFO - **********Environmental information**********
03/28 21:53:33 - mmengine - INFO - sys.platform: linux
03/28 21:53:33 - mmengine - INFO - Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
03/28 21:53:33 - mmengine - INFO - CUDA available: True
03/28 21:53:33 - mmengine - INFO - numpy_random_seed: 2147483648
03/28 21:53:33 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 3090
03/28 21:53:33 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
03/28 21:53:33 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.7, V11.7.64
03/28 21:53:33 - mmengine - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
03/28 21:53:33 - mmengine - INFO - PyTorch: 1.13.1+cu117
03/28 21:53:33 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.7
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.8.1 (built against CUDA 11.8)
- Built with CuDNN 8.5
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
03/28 21:53:33 - mmengine - INFO - TorchVision: 0.14.1+cu117
03/28 21:53:33 - mmengine - INFO - OpenCV: 4.5.5
03/28 21:53:33 - mmengine - INFO - MMEngine: 0.6.0
03/28 21:53:33 - mmengine - INFO - MMCV: 2.0.0rc4
03/28 21:53:33 - mmengine - INFO - MMCV Compiler: GCC 9.3
03/28 21:53:33 - mmengine - INFO - MMCV CUDA Compiler: 11.7
03/28 21:53:33 - mmengine - INFO - MMDeploy: 1.0.0rc3+032ce75
03/28 21:53:33 - mmengine - INFO -
03/28 21:53:33 - mmengine - INFO - **********Backend information**********
03/28 21:53:33 - mmengine - INFO - tensorrt: 8.4.2.4
03/28 21:53:33 - mmengine - INFO - tensorrt custom ops: Available
03/28 21:53:33 - mmengine - INFO - ONNXRuntime: None
03/28 21:53:33 - mmengine - INFO - ONNXRuntime-gpu: 1.8.1
03/28 21:53:33 - mmengine - INFO - ONNXRuntime custom ops: Available
03/28 21:53:33 - mmengine - INFO - pplnn: None
03/28 21:53:33 - mmengine - INFO - ncnn: None
03/28 21:53:33 - mmengine - INFO - snpe: None
03/28 21:53:33 - mmengine - INFO - openvino: 2022.3.0
03/28 21:53:33 - mmengine - INFO - torchscript: 1.13.1
03/28 21:53:33 - mmengine - INFO - torchscript custom ops: NotAvailable
03/28 21:53:33 - mmengine - INFO - rknn-toolkit: None
03/28 21:53:33 - mmengine - INFO - rknn-toolkit2: None
03/28 21:53:33 - mmengine - INFO - ascend: None
03/28 21:53:33 - mmengine - INFO - coreml: None
03/28 21:53:33 - mmengine - INFO - tvm: None
03/28 21:53:33 - mmengine - INFO - vacc: None
03/28 21:53:33 - mmengine - INFO -
03/28 21:53:33 - mmengine - INFO - **********Codebase information**********
03/28 21:53:33 - mmengine - INFO - mmdet: 3.0.0rc6
03/28 21:53:33 - mmengine - INFO - mmseg: None
03/28 21:53:33 - mmengine - INFO - mmcls: None
03/28 21:53:33 - mmengine - INFO - mmocr: None
03/28 21:53:33 - mmengine - INFO - mmedit: None
03/28 21:53:33 - mmengine - INFO - mmdet3d: None
03/28 21:53:33 - mmengine - INFO - mmpose: 1.0.0rc1
03/28 21:53:33 - mmengine - INFO - mmrotate: None
03/28 21:53:33 - mmengine - INFO - mmaction: 1.0.0rc3
03/28 21:53:33 - mmengine - INFO - mmrazor: None
Error traceback
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17
It works! Thanks for your help. First time to realize that
tolist()will consume so much resource.