iree: Bufferization is particularly slow for large programs
Part of https://github.com/iree-org/iree/issues/11994
When tracing the compiler, the EliminateEmptyTensors
pass and IREEComprehensiveBufferize
pass (sources in IREEComprehensiveBufferizePass.cpp) shows up as taking a long time when compiling programs like the one linked at https://github.com/iree-org/iree/issues/11994#issuecomment-1409231149 (at least for Vulkan):
Note from the trace (Debug mode, but probably applicable to Release) that
- executable translation is running in parallel, so this isn’t necessarily a bottleneck
- the timing distribution has a few spikes (60ms, 5 seconds) and several outliers (1m40s)
It would be nice to include function names or locations in compiler traces for easier correlation between trace zones and IR dumps. I could guess at which executables are slow based on data sizes or the number of ops…
We also discussed this a bit here on Discord. @MaheshRavishankar @benvanik
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 19 (8 by maintainers)
Yeah. 77 as the innermost dimension size would trigger unrolling to 1. Right now tiling pretty much only tries some power of two. Would need to adjust tiling a bit to materialize a loop for these odd numbers. I’ll look into handling this a bit later…
Here’s one of executables that is slow to bufferize (
--iree-hal-target-backends=vulkan --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna3-unknown-unknown
)https://gist.github.com/ScottTodd/9c6d0bc780762634c907df3666852f59