tvm: [Performance regression] Revamp IntSet #3272 causing GluonCV SSD performance issue
https://github.com/dmlc/tvm/pull/3272 is causing the similar issue used to happen in https://github.com/dmlc/tvm/issues/3097
The operator fused_strided_slice_greater_cast_strided_slice_zeros_like_add_add_add_add_add_ad_11203150218747419416_ is much slower due to mod operation not simplified:
placeholder[((((ax0.ax1.fused*4) + ax2) + -466036) % 16)]
While before this PR it is:
placeholder[((((ax0.ax1.fused*4) + ax2) + -4) % 16)]
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 19 (19 by maintainers)
@kevinthesun please check again now that the the integer simplification infra lands
Yes, introducing floordiv/mod might improve the perf further, but will need a few more PRs to change the division mode to take benefit of that. I would encourage us to separate the issue. If you can still isolate things that can be improved, we can dig further here
https://github.com/dmlc/tvm/pull/3467