iree: RV32 code size regression from #11576 LLVM integrate on 2022-12-16

Suspected LLVM integration: #11576

https://perf.iree.dev/serie?IREE?PersonDetect [int8] (TFLite) CPU-RV32-Generic full-inference%2Cdefault-flags [compilation%3Amodule%3Acomponent-size%3Atotal-dispatch-size]

Direct local repro is nontrivial due to MLIR IR format change since that time, requires an iree-import-tflite from that timeframe. To make it easier, attaching here the already imported file. person_detect.zip

cmake --build . --target iree-compile && tools/iree-compile --iree-hal-target-backends=llvm-cpu --iree-input-type=tosa --iree-llvm-target-abi=ilp32 --iree-llvm-target-cpu-features=+m,+a,+f,+zvl512b,+zve32x --iree-llvm-target-cpu=generic-rv32 --iree-llvm-target-triple=riscv32-pc-linux-elf --riscv-v-fixed-length-vector-lmul-max=8 --riscv-v-vector-bits-min=512 benchmark_suites/TFLite/person_detect.tflite.mlir -o /tmp/a.vmfb --iree-llvm-keep-linker-artifacts 2>&1 | grep -o '/.*\.so' | xargs size -A | grep '^\.text' | awk '{print $2}'

Prints:

at IREE commit (and `git submodule update`)	value
79b90d32d1723b0650b33bc5584dccb4828e5421	871912
7b4688272e40e939dc02053c7178b111a21eadd5	180752

About this issue

Original URL
State: closed
Created a year ago
Comments: 28 (25 by maintainers)

Commits related to this issue

[rv32] Expand `arith.mulsi_extended` before going to LLVM This enables `tosa.apply_scale` to be vectorized, and thus fixes a code size regression. Fixes: https://github.com/iree-org/iree/issues/1223... — committed to kuhar/iree by kuhar a year ago
[rv32] Expand `arith.mulsi_extended` before going to LLVM (#12241) This enables `tosa.apply_scale` to be vectorized, and thus fixes a code size regression. Fixes: https://github.com/iree-org/iree... — committed to iree-org/iree by kuhar a year ago
[rv32] Expand `arith.mulsi_extended` before going to LLVM (#12241) This enables `tosa.apply_scale` to be vectorized, and thus fixes a code size regression. Fixes: https://github.com/iree-org/iree... — committed to qedawkins/iree by kuhar a year ago
[rv32] Expand `arith.mulsi_extended` before going to LLVM (#12241) This enables `tosa.apply_scale` to be vectorized, and thus fixes a code size regression. Fixes: https://github.com/iree-org/iree... — committed to iree-org/iree by kuhar a year ago
[rv32] Expand `arith.mulsi_extended` before going to LLVM (#12241) This enables `tosa.apply_scale` to be vectorized, and thus fixes a code size regression. Fixes: https://github.com/iree-org/iree... — committed to plaidml/iree by kuhar a year ago

Most upvoted comments

Bisect results coming soon (~ 5 bisection steps remaining)

bjacob on Feb 16, 2023

I confirm that #12241 fixes the testcase here. Using the test in this Issue description (and using current iree-import-tflite to generate person_detect.tflite.mlir):

commit	.text size
Current `main`	905036
With #12241	185720

Thanks a lot @kuhar for the effective fix!

bjacob on Feb 16, 2023

If possible, I would really appreciate this to be included in main, as our project pulls IREE’s release candidate for iree-compile instead of building from source. I am okay with hiding this behind a compile flag if we don’t want the behavior to be default.

hcindyl on Feb 16, 2023

The original IR generates two mul ops: from the first one, only the high part is used, from the second one, only the low part is used. The backend is able to match these patterns and generate the corresponding hi/lo muls included in zve32x. If we generate a single mul and extract the hi/low parts from it, the backend will try to generate a single mul and will end up scalarizing it.

dcaballe on Feb 16, 2023