iree: Crash when compiling mhlo.dynamic_iota.

This is from experiments with JAX+IREE and dynamic shapes. This bug is about mhlo.dynamic_iota.

This may be related to #7888 about mhlo.dynamic_reshape.

Repro (put the code below in iree_repro.py and run pythoniree_repro.py:

from iree.compiler import compile_str

CODE = """
module @jit_f.4  {
  func public @main(%arg0: tensor<?xf32> {mhlo.is_same_data_across_replicas} loc(unknown)) -> tensor<?xi32> {
    %0 = "mhlo.get_dimension_size"(%arg0) {dimension = 0 : i64} : (tensor<?xf32>) -> tensor<i32>
    %3 = "mhlo.broadcast_in_dim"(%0) {broadcast_dimensions = dense<> : tensor<0xi64>} : (tensor<i32>) -> tensor<1xi32>
    %4 = "mhlo.concatenate"(%3) {dimension = 0 : i64} : (tensor<1xi32>) -> tensor<1xi32>
    %5 = "mhlo.dynamic_iota"(%4) {iota_dimension = 0 : i64} : (tensor<1xi32>) -> tensor<?xi32>
    return %5 : tensor<?xi32>
  }
}
"""

Error:

$ python tests/iree_repro.py
Traceback (most recent call last):
  File "/Users/necula/Source/jax/tests/iree_repro.py", line 15, in <module>
    compiled_flatbuffer = compile_str(CODE, target_backends=["dylib"], input_type="mhlo")
  File "/Users/necula/.pyenv/versions/jax39/lib/python3.9/site-packages/iree/compiler/tools/core.py", line 262, in compile_str
    result = invoke_immediate(cl, immediate_input=input_bytes)
  File "/Users/necula/.pyenv/versions/jax39/lib/python3.9/site-packages/iree/compiler/tools/binaries.py", line 201, in invoke_immediate
    raise CompilerToolError(process)
iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool ireec
Diagnostics:
LLVM ERROR: SmallVector unable to grow. Requested capacity (140704432398336) is larger than maximum value for size type (4294967295)
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /Users/necula/.pyenv/versions/jax39/lib/python3.9/site-packages/iree/compiler/tools/../_mlir_libs/ireec - --iree-input-type=mhlo --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=dylib --iree-mlir-to-vm-bytecode-module --iree-llvm-embedded-linker-path=/Users/necula/.pyenv/versions/jax39/lib/python3.9/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libIREECompilerAggregateCAPI.dylib 0x00000001116a29d7 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 4715143
1  libIREECompilerAggregateCAPI.dylib 0x00000001116a18f8 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 4710824
2  libIREECompilerAggregateCAPI.dylib 0x00000001116a2ff0 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 4716704
3  libsystem_platform.dylib           0x00007ff80c349e2d _sigtramp + 29
4  libIREECompilerAggregateCAPI.dylib 0x0000000114c6eea8 llvm::Module::getDarwinTargetVariantSDKVersion() const + 722760
5  libsystem_c.dylib                  0x00007ff80c280d10 abort + 123
6  libIREECompilerAggregateCAPI.dylib 0x000000011160a2fd mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 4090797
7  libIREECompilerAggregateCAPI.dylib 0x0000000111654ae2 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 4395922
8  libIREECompilerAggregateCAPI.dylib 0x0000000111654911 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 4395457
9  libIREECompilerAggregateCAPI.dylib 0x00000001125a1c61 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::Check::ExpectTrueOp>() + 1613489
10 libIREECompilerAggregateCAPI.dylib 0x00000001125a158d mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::Check::ExpectTrueOp>() + 1611741
11 libIREECompilerAggregateCAPI.dylib 0x0000000114a2519f llvm::Loop::getLocRange() const + 3376191
12 libIREECompilerAggregateCAPI.dylib 0x0000000114c54f31 llvm::Module::getDarwinTargetVariantSDKVersion() const + 616401
13 libIREECompilerAggregateCAPI.dylib 0x0000000114a2fe01 llvm::Loop::getLocRange() const + 3420321
14 libIREECompilerAggregateCAPI.dylib 0x0000000114a28e4d llvm::Loop::getLocRange() const + 3391725
15 libIREECompilerAggregateCAPI.dylib 0x0000000114a2bfab llvm::Loop::getLocRange() const + 3404363
16 libIREECompilerAggregateCAPI.dylib 0x000000011253e4b2 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::Check::ExpectTrueOp>() + 1206018
17 libIREECompilerAggregateCAPI.dylib 0x00000001116d2f1f mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 4913103
18 libIREECompilerAggregateCAPI.dylib 0x00000001116d3273 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 4913955
19 libIREECompilerAggregateCAPI.dylib 0x00000001116d4814 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 4919492
20 libIREECompilerAggregateCAPI.dylib 0x00000001112383f3 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 85155
21 libIREECompilerAggregateCAPI.dylib 0x00000001112364b0 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 77152
22 libIREECompilerAggregateCAPI.dylib 0x00000001145682ef llvm::MachineFunction::verify(llvm::Pass*, char const*, bool) const + 10589887
23 libIREECompilerAggregateCAPI.dylib 0x00000001112396ff mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 90031
24 libIREECompilerAggregateCAPI.dylib 0x00000001112390d3 mlir::TypeID mlir::detail::TypeIDExported::get<mlir::iree_compiler::IREE::VM::YieldOp>() + 88451
25 dyld                               0x000000011b4834fe start + 462


Invoked with:
 ireec /Users/necula/.pyenv/versions/jax39/lib/python3.9/site-packages/iree/compiler/tools/../_mlir_libs/ireec - --iree-input-type=mhlo --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=dylib --iree-mlir-to-vm-bytecode-module --iree-llvm-embedded-linker-path=/Users/necula/.pyenv/versions/jax39/lib/python3.9/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 23 (9 by maintainers)

Commits related to this issue

Most upvoted comments

yeah totally. This is just detensoring though. So it already happens to some extent. I’d like to make sure this is valid IR before going down the path of “needs to be fixed in linalg”.

I think we may want them on device at some point depending on how the frontends/users use them - but am fine saying today they shouldn’t be. Maybe that is a stepping stone for detensoring as well: detensor all the index tensors only. I’m adding a verifier to flow.dispatch.workgroups so we’ll at least get good diagnostics on this there.

This ends up getting pretty far for me and dying on a TODO - it looks like some additional data-dependent shape handling is required. We really need to be detensoring these things - this is still computing the shape on device and reading it back. @ScottTodd did detensoring die?

That’s a 😢 backtrace - no clue why it has some symbols but they are all wrong - if it were stripped I’d not expect to see any of those. @stellaraccident any ideas?

First I’m seeing the symbol issue, but I note that this is on osx without the symbolizer. It happens that those symbols are some of the only ones being emitted with default visibility so the last chance stack dumper is likely just choosing a close match, and those are the closest.

Should make a separate issue for generating release binaries with at least some minimal symbols. Probably worth the cost.

(pytorch doesn’t)