iree: Bad quantized model compilation time for RISC-V target

This is the bug mainly for tracking compilation time regression on RISC-V targets.

We used to compile the (quantized) person_detection.tflite model at 3 seconds. After landing https://github.com/google/iree/pull/8409, the quantized_matmul ops are lowered to matmul ops which kicks in matmul vectorization. It takes 13 min to compile the model.

With the fixes in https://github.com/google/iree/issues/8210, we’re able to compile the model in 150 seconds (which is still bad). After profiling the pass-timing, I found that most of the time spent in HAL-related passes.

===-------------------------------------------------------------------------===
                         ... Execution time report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 140.5211 seconds

  ----User Time----  ----Wall Time----  ----Name----
  297.5860 (196.1%)  276.3979 (196.7%)  'hal.executable' Pipeline
  151.7494 (100.0%)  140.5211 (100.0%)  root
  149.1610 ( 98.3%)  138.2025 ( 98.3%)  Pipeline Collection : ['hal.executable']
  137.6607 ( 90.7%)  137.6607 ( 98.0%)  mlir::iree_compiler::IREE::HAL::SerializeExecutablesPass
  137.6586 ( 90.7%)  137.6586 ( 98.0%)  mlir::iree_compiler::IREE::HAL::SerializeTargetExecutablesPass
   25.8591 ( 17.0%)   25.8591 ( 18.4%)  'hal.executable.variant' Pipeline
   11.0800 (  7.3%)   11.0800 (  7.9%)  Pipeline Collection : ['hal.executable.variant']
   10.9722 (  7.2%)   10.9722 (  7.8%)  mlir::iree_compiler::IREE::HAL::TranslateTargetExecutableVariantsPass
    8.1665 (  5.4%)    8.1665 (  5.8%)  Pipeline Collection : ['builtin.module']
    6.4248 (  4.2%)    6.4248 (  4.6%)  LLVMCPULowerExecutableTarget
    5.1236 (  3.4%)    5.1236 (  3.6%)  'builtin.module' Pipeline
    3.3519 (  2.2%)    3.3519 (  2.4%)  Pipeline Collection : ['builtin.func']
    1.3615 (  0.9%)    1.3615 (  1.0%)  'builtin.func' Pipeline
    1.2922 (  0.9%)    1.2502 (  0.9%)  Canonicalizer
    1.2305 (  0.8%)    1.2305 (  0.9%)  ConvertToLLVM
    0.7574 (  0.5%)    0.7574 (  0.5%)  mlir::iree_compiler::IREE::Util::(anonymous namespace)::FoldGlobalsPass
   11.1462 (  7.3%)    0.5405 (  0.4%)  mlir::iree_compiler::IREE::HAL::TranslateExecutablesPass
    0.3919 (  0.3%)    0.3919 (  0.3%)  mlir::iree_compiler::IREE::Util::ApplyPatternsPass
    0.3122 (  0.2%)    0.2910 (  0.2%)  CSE

We don’t invest much effort for quantized vectorization models, so I’m not surprised seeing such issue. A temp solution could be falling it back to non-vectorization pipeline if the target is RISC-V.

To repro:

Download tflite file: https://github.com/google/iree/blob/main/integrations/tensorflow/test/python/iree_tfl_tests/person_detect_test.py#L14
import it into MLIR with iree-import-tosa
Run the translation targeting on RISC-V: iree-translate -iree-mlir-to-vm-bytecode-module --iree-hal-target-backends=dylib-llvm-aot --iree-llvm-link-embedded=tru e -iree-input-type=tosa ../tosa.mlir -o /tmp/a.vmfb -iree-llvm-target-triple=riscv64 -iree-llvm-target-cpu=generic-rv64 -iree-llvm-target-abi=lp64d - iree-llvm-target-cpu-features=+m,+a,+f,+d,+c

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 21 (20 by maintainers)

Commits related to this issue

Switch all x86 and RISC-V matmul codegen to use DoubleTilingExpert. (#8539) The RISC-V targets were considered as ARM configuration. This PR makes it go through Sandbox based approach. The commit... — committed to iree-org/iree by hanhanW 2 years ago

Most upvoted comments

I’m also running into long compilation times for quantized models for x86 and ARM. Some example MLIRs attached here for quantized MobileBert and Deeplab. My build kite runs don’t complete, even after 11 hours.

mariecwhite on Mar 11, 2022

@hcindyl The patch is landed in IREE within https://github.com/google/iree/commit/6c182f591dd666222589115a5edf0aa5e556ed2b

hanhanW on Mar 9, 2022