iree: [Regression] Large models taking more than an hour to build
What happened?
Between f3d0369 and 967ab3b, Benchmark Large workflow started timing out. Several JAX models are taking more than an hour to build on CUDA. These models took at most 2 minutes to build.
Steps to reproduce your issue
Download any one of:
- https://storage.googleapis.com/iree-model-artifacts/jax/jax_models_0.4.10_1684396752/T5_LARGE/batch_16/stablehlo.mlirbc
- https://storage.googleapis.com/iree-model-artifacts/jax/jax_models_0.4.10_1684396752/T5_LARGE/batch_24/stablehlo.mlirbc
- https://storage.googleapis.com/iree-model-artifacts/jax/jax_models_0.4.10_1684396752/T5_LARGE/batch_32/stablehlo.mlirbc
- https://storage.googleapis.com/iree-model-artifacts/jax/jax_models_0.4.10_1684396752/T5_LARGE/batch_48/stablehlo.mlirbc
Compile:
iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=cuda --iree-input-type=stablehlo --iree-hal-cuda-llvm-target-arch=sm_80 <path to mlirbc> -o /tmp/module.vmfb
What component(s) does this issue relate to?
No response
Version information
Started failing between f3d0369 and 967ab3b.
Additional context
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 26 (14 by maintainers)
Commits related to this issue
- Revert NVPTX buggy commit in LLVM (#14125) This PR reverts the buggy NVPTX commit in LLVM that causes a compile time issue. See https://github.com/openxla/iree/issues/14067 — committed to iree-org/iree by dcaballe a year ago
- Revert NVPTX buggy commit in LLVM (#14125) This PR reverts the buggy NVPTX commit in LLVM that causes a compile time issue. See https://github.com/openxla/iree/issues/14067 — committed to plaidml/iree by dcaballe a year ago
I’m going to start bisecting to find the culprit commit.