TensorRT: CopyPackedKernel is taking too long, and how to optimize it
Description
I have a model that uses a slice operator for feature crossing, but it turns out that the slice operator calls the CopyPackedKernel API, and it consumes a lot of time. I also re-implemented the slice operator myself, but the same result was achieved,I don’t know when CopyPackedKernel is running, how to optimize it.
nsys profile -o test --stats=true python infer.py -e test.plan
output:
Time(%) Total Time Instances Average Minimum Maximum Name
------- -------------- ---------- -------------- -------------- -------------- --------------------------------------------------------------------------------------------------------------------
99.9 9863089 2835 3479.0 3423 3872 void genericReformat::copyPackedKernel<float, float, true, true, genericReformat::IdentityCoordMapper<4>, 4>(unsigned int, unsigned int, void const*, genericReformat::ArrayN<4>, genericReformat::ArrayNWithReducedDivisors<4>, genericReformat::ArrayN<4>, int, int, int, float const*, void*, genericReformat::ArrayN<4>, genericReformat:
0.1 7264 3 2421.3 2304 2656 slice(float const*, float*, int, int, int, int)
Environment
TensorRT Version: 7.2.2.1 NVIDIA GPU: T4 NVIDIA Driver Version: 450.51.06 CUDA Version: 11.1 CUDNN Version: Operating System: Python Version (if applicable): 3.8 Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version): Container 20.12
I need help, thank you very much.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 36
Hello @zhaohb , just checked those are not really reformat, they are
slice
implementation. And slice is memory bound, to optimize this we need consider other approaches. I have two questions:slice
in your real network? are they constant? If so we can first preprocess theseslice
and replace withconstant
slice
in your real network? If they are plugins can we adjust the plugin implementation, like calculate the address/offset inside the plugin, then we can remove theslice
?thanks!