iree: Bad quantized model compilation time for RISC-V target
This is the bug mainly for tracking compilation time regression on RISC-V targets.
We used to compile the (quantized) person_detection.tflite model at 3 seconds. After landing https://github.com/google/iree/pull/8409, the quantized_matmul ops are lowered to matmul ops which kicks in matmul vectorization. It takes 13 min to compile the model.
With the fixes in https://github.com/google/iree/issues/8210, we’re able to compile the model in 150 seconds (which is still bad). After profiling the pass-timing, I found that most of the time spent in HAL-related passes.
===-------------------------------------------------------------------------===
... Execution time report ...
===-------------------------------------------------------------------------===
Total Execution Time: 140.5211 seconds
----User Time---- ----Wall Time---- ----Name----
297.5860 (196.1%) 276.3979 (196.7%) 'hal.executable' Pipeline
151.7494 (100.0%) 140.5211 (100.0%) root
149.1610 ( 98.3%) 138.2025 ( 98.3%) Pipeline Collection : ['hal.executable']
137.6607 ( 90.7%) 137.6607 ( 98.0%) mlir::iree_compiler::IREE::HAL::SerializeExecutablesPass
137.6586 ( 90.7%) 137.6586 ( 98.0%) mlir::iree_compiler::IREE::HAL::SerializeTargetExecutablesPass
25.8591 ( 17.0%) 25.8591 ( 18.4%) 'hal.executable.variant' Pipeline
11.0800 ( 7.3%) 11.0800 ( 7.9%) Pipeline Collection : ['hal.executable.variant']
10.9722 ( 7.2%) 10.9722 ( 7.8%) mlir::iree_compiler::IREE::HAL::TranslateTargetExecutableVariantsPass
8.1665 ( 5.4%) 8.1665 ( 5.8%) Pipeline Collection : ['builtin.module']
6.4248 ( 4.2%) 6.4248 ( 4.6%) LLVMCPULowerExecutableTarget
5.1236 ( 3.4%) 5.1236 ( 3.6%) 'builtin.module' Pipeline
3.3519 ( 2.2%) 3.3519 ( 2.4%) Pipeline Collection : ['builtin.func']
1.3615 ( 0.9%) 1.3615 ( 1.0%) 'builtin.func' Pipeline
1.2922 ( 0.9%) 1.2502 ( 0.9%) Canonicalizer
1.2305 ( 0.8%) 1.2305 ( 0.9%) ConvertToLLVM
0.7574 ( 0.5%) 0.7574 ( 0.5%) mlir::iree_compiler::IREE::Util::(anonymous namespace)::FoldGlobalsPass
11.1462 ( 7.3%) 0.5405 ( 0.4%) mlir::iree_compiler::IREE::HAL::TranslateExecutablesPass
0.3919 ( 0.3%) 0.3919 ( 0.3%) mlir::iree_compiler::IREE::Util::ApplyPatternsPass
0.3122 ( 0.2%) 0.2910 ( 0.2%) CSE
We don’t invest much effort for quantized vectorization models, so I’m not surprised seeing such issue. A temp solution could be falling it back to non-vectorization pipeline if the target is RISC-V.
To repro:
- Download tflite file: https://github.com/google/iree/blob/main/integrations/tensorflow/test/python/iree_tfl_tests/person_detect_test.py#L14
- import it into MLIR with
iree-import-tosa
- Run the translation targeting on RISC-V:
iree-translate -iree-mlir-to-vm-bytecode-module --iree-hal-target-backends=dylib-llvm-aot --iree-llvm-link-embedded=tru e -iree-input-type=tosa ../tosa.mlir -o /tmp/a.vmfb -iree-llvm-target-triple=riscv64 -iree-llvm-target-cpu=generic-rv64 -iree-llvm-target-abi=lp64d - iree-llvm-target-cpu-features=+m,+a,+f,+d,+c
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (20 by maintainers)
I’m also running into long compilation times for quantized models for x86 and ARM. Some example MLIRs attached here for quantized MobileBert and Deeplab. My build kite runs don’t complete, even after 11 hours.
@hcindyl The patch is landed in IREE within https://github.com/google/iree/commit/6c182f591dd666222589115a5edf0aa5e556ed2b