iree: Bad quantized model compilation time for RISC-V target

This is the bug mainly for tracking compilation time regression on RISC-V targets.

We used to compile the (quantized) person_detection.tflite model at 3 seconds. After landing https://github.com/google/iree/pull/8409, the quantized_matmul ops are lowered to matmul ops which kicks in matmul vectorization. It takes 13 min to compile the model.

With the fixes in https://github.com/google/iree/issues/8210, we’re able to compile the model in 150 seconds (which is still bad). After profiling the pass-timing, I found that most of the time spent in HAL-related passes.

===-------------------------------------------------------------------------===
                         ... Execution time report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 140.5211 seconds

  ----User Time----  ----Wall Time----  ----Name----
  297.5860 (196.1%)  276.3979 (196.7%)  'hal.executable' Pipeline
  151.7494 (100.0%)  140.5211 (100.0%)  root
  149.1610 ( 98.3%)  138.2025 ( 98.3%)  Pipeline Collection : ['hal.executable']
  137.6607 ( 90.7%)  137.6607 ( 98.0%)  mlir::iree_compiler::IREE::HAL::SerializeExecutablesPass
  137.6586 ( 90.7%)  137.6586 ( 98.0%)  mlir::iree_compiler::IREE::HAL::SerializeTargetExecutablesPass
   25.8591 ( 17.0%)   25.8591 ( 18.4%)  'hal.executable.variant' Pipeline
   11.0800 (  7.3%)   11.0800 (  7.9%)  Pipeline Collection : ['hal.executable.variant']
   10.9722 (  7.2%)   10.9722 (  7.8%)  mlir::iree_compiler::IREE::HAL::TranslateTargetExecutableVariantsPass
    8.1665 (  5.4%)    8.1665 (  5.8%)  Pipeline Collection : ['builtin.module']
    6.4248 (  4.2%)    6.4248 (  4.6%)  LLVMCPULowerExecutableTarget
    5.1236 (  3.4%)    5.1236 (  3.6%)  'builtin.module' Pipeline
    3.3519 (  2.2%)    3.3519 (  2.4%)  Pipeline Collection : ['builtin.func']
    1.3615 (  0.9%)    1.3615 (  1.0%)  'builtin.func' Pipeline
    1.2922 (  0.9%)    1.2502 (  0.9%)  Canonicalizer
    1.2305 (  0.8%)    1.2305 (  0.9%)  ConvertToLLVM
    0.7574 (  0.5%)    0.7574 (  0.5%)  mlir::iree_compiler::IREE::Util::(anonymous namespace)::FoldGlobalsPass
   11.1462 (  7.3%)    0.5405 (  0.4%)  mlir::iree_compiler::IREE::HAL::TranslateExecutablesPass
    0.3919 (  0.3%)    0.3919 (  0.3%)  mlir::iree_compiler::IREE::Util::ApplyPatternsPass
    0.3122 (  0.2%)    0.2910 (  0.2%)  CSE

We don’t invest much effort for quantized vectorization models, so I’m not surprised seeing such issue. A temp solution could be falling it back to non-vectorization pipeline if the target is RISC-V.

To repro:

  1. Download tflite file: https://github.com/google/iree/blob/main/integrations/tensorflow/test/python/iree_tfl_tests/person_detect_test.py#L14
  2. import it into MLIR with iree-import-tosa
  3. Run the translation targeting on RISC-V: iree-translate -iree-mlir-to-vm-bytecode-module --iree-hal-target-backends=dylib-llvm-aot --iree-llvm-link-embedded=tru e -iree-input-type=tosa ../tosa.mlir -o /tmp/a.vmfb -iree-llvm-target-triple=riscv64 -iree-llvm-target-cpu=generic-rv64 -iree-llvm-target-abi=lp64d - iree-llvm-target-cpu-features=+m,+a,+f,+d,+c

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (20 by maintainers)

Commits related to this issue

Most upvoted comments

I’m also running into long compilation times for quantized models for x86 and ARM. Some example MLIRs attached here for quantized MobileBert and Deeplab. My build kite runs don’t complete, even after 11 hours.