iree: Vulkan/CUDA runtime error: failed to wait on timepoint
What happened?
Unet failed on some AMD rdna2/rdna3 and nvidia A100 devices, with error:
<vm>:0: OK; failed to wait on timepoint;
[ 0] bytecode module.forward:131934 [
<eval_with_key>.9:5042:14,
<eval_with_key>.9:5039:15,
<eval_with_key>.9:5038:15,
However, all the dumped dispatches ran without any problem.
Version information
- Download the latest Unet model
- Compile command for Vulkan
iree-compile --iree-input-type=none --iree-hal-target-backends=vulkan --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs -iree-vulkan-target-triple=rdna2-7900-linux --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))' unet_1_64_512_512_fp16_stable-diffusion-2-1-base_vulkan/unet_1_64_512_512_fp16_stable-diffusion-2-1-base_vulkan_torch.mlir -o unet.vmfb
- Benchmark command for Vulkan
iree-benchmark-module --module=unet.vmfb --function=forward --device=vulkan --input=1x4x64x64xf16 --input=1xf16 --input=2x64x1024xf16 --input=f32=1.0
Additional context
Also failed on A100, commands for cuda path are as following:
iree-compile --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=cuda --iree-llvmcpu-target-cpu-features=host --iree-hal-cuda-disable-loop-nounroll-wa --iree-hal-cuda-llvm-target-arch=sm_80 --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))' unet_1_64_512_512_fp16_stable-diffusion-2-1-base_cuda/unet_1_64_512_512_fp16_stable-diffusion-2-1-base_cuda_torch.mlir -o unet.vmfb
iree-benchmark-module --module=unet.vmfb --device=cuda --function=forward --input=1x4x64x64xf16 --input=1xf16 --input=2x64x1024xf16 --input=f32=1.0
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 22 (12 by maintainers)
Commits related to this issue
- Refreshing local stack state after VM import calls. Prior to the work adding wait frames it was not possible for the native imports to grow the VM stack. The bytecode dispatch loop was exploiting this... — committed to iree-org/iree by benvanik a year ago
- Refreshing local stack state after VM import calls. (#12809) Prior to the work adding wait frames it was not possible for the native imports to grow the VM stack. The bytecode dispatch loop was expl... — committed to iree-org/iree by benvanik a year ago
- Refreshing local stack state after VM import calls. (#12809) Prior to the work adding wait frames it was not possible for the native imports to grow the VM stack. The bytecode dispatch loop was expl... — committed to iree-org/iree by benvanik a year ago
- Refreshing local stack state after VM import calls. (#12809) Prior to the work adding wait frames it was not possible for the native imports to grow the VM stack. The bytecode dispatch loop was expl... — committed to NatashaKnk/iree by benvanik a year ago
@dan-garvey can you please take over this to verify if this fixes the problem after Ben’s fix ? Lei seemed to notice NaNs after this fix but lets verify on ToM IREE
With the latest python package (iree-compiler 20230328.472), I’m now getting this new error on Unet.
Not sure if it’s related to the old error, but showing as different message.