iree: Bufferization is particularly slow for large programs

Part of https://github.com/iree-org/iree/issues/11994

When tracing the compiler, the EliminateEmptyTensors pass and IREEComprehensiveBufferize pass (sources in IREEComprehensiveBufferizePass.cpp) shows up as taking a long time when compiling programs like the one linked at https://github.com/iree-org/iree/issues/11994#issuecomment-1409231149 (at least for Vulkan):

image

image

Note from the trace (Debug mode, but probably applicable to Release) that

  • executable translation is running in parallel, so this isn’t necessarily a bottleneck
  • the timing distribution has a few spikes (60ms, 5 seconds) and several outliers (1m40s)

It would be nice to include function names or locations in compiler traces for easier correlation between trace zones and IR dumps. I could guess at which executables are slow based on data sizes or the number of ops…


We also discussed this a bit here on Discord. @MaheshRavishankar @benvanik

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19 (8 by maintainers)

Most upvoted comments

Yeah. 77 as the innermost dimension size would trigger unrolling to 1. Right now tiling pretty much only tries some power of two. Would need to adjust tiling a bit to materialize a loop for these odd numbers. I’ll look into handling this a bit later…

Here’s one of executables that is slow to bufferize (--iree-hal-target-backends=vulkan --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna3-unknown-unknown)

https://gist.github.com/ScottTodd/9c6d0bc780762634c907df3666852f59

image