iree: Stack allocation failure for Stable Diffusion on CPU [on Windows]
What happened?
iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile.exe
Diagnostics:
xyz.str:1437:12: error: 'builtin.module' op expected total size of stack allocation is not greater than 32768 bytes, but got 4259840 bytes
xyz.str:26:3: note: called from
xyz.str:1437:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "-avx512pf,-tsxldtrk,+cx16,+sahf,-tbm,-avx512ifma,+sha,+crc32,-fma4,-vpclmulqdq,+prfchw,+bmi2,-cldemote,+fsgsbase,-avx512bf16,-amx-tile,-raoint,-uintr,-gfni,+popcnt,-ptwrite,+aes,-avx512bitalg,-movdiri,-widekl,+xsaves,-avx512er,-avxvnni,-avx512fp16,-avx512vnni,-amx-bf16,-avxvnniint8,-avx512vpopcntdq,-pconfig,+clwb,-cmpccxadd,-avx512f,+xsavec,+clzero,-pku,-amx-fp16,+mmx,-lwp,+rdpid,-xop,+rdseed,-waitpkg,-prefetchi,-kl,-movdir64b,+sse4a,-avx512bw,-avxneconvert,+clflushopt,+xsave,-avx512vbmi2,+64bit,-avx512vl,-serialize,-hreset,-invpcid,-avx512cd,+avx,-vaes,-amx-int8,+cx8,+fma,-rtm,+bmi,-enqcmd,+rdrnd,+mwaitx,+sse4.1,+sse4.2,+avx2,+fxsr,+wbnoinvd,+sse,+lzcnt,+pclmul,+rdpru,-avxifma,+f16c,+ssse3,-sgx,-prefetchwt1,+cmov,-avx512vbmi,-shstk,+movbe,-avx512vp2intersect,+xsaveopt,-avx512dq,+sse2,+adx,+sse3", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-unknown-unknown-eabi-elf"}>
xyz.str:26:3: note: called from
xyz.str:1437:12: error: failed to serialize executables
xyz.str:26:3: note: called from
xyz.str:3232:12: error: 'builtin.module' op expected total size of stack allocation is not greater than 32768 bytes, but got 1064960 bytes
xyz.str:26:3: note: called from
xyz.str:3232:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "-avx512pf,-tsxldtrk,+cx16,+sahf,-tbm,-avx512ifma,+sha,+crc32,-fma4,-vpclmulqdq,+prfchw,+bmi2,-cldemote,+fsgsbase,-avx512bf16,-amx-tile,-raoint,-uintr,-gfni,+popcnt,-ptwrite,+aes,-avx512bitalg,-movdiri,-widekl,+xsaves,-avx512er,-avxvnni,-avx512fp16,-avx512vnni,-amx-bf16,-avxvnniint8,-avx512vpopcntdq,-pconfig,+clwb,-cmpccxadd,-avx512f,+xsavec,+clzero,-pku,-amx-fp16,+mmx,-lwp,+rdpid,-xop,+rdseed,-waitpkg,-prefetchi,-kl,-movdir64b,+sse4a,-avx512bw,-avxneconvert,+clflushopt,+xsave,-avx512vbmi2,+64bit,-avx512vl,-serialize,-hreset,-invpcid,-avx512cd,+avx,-vaes,-amx-int8,+cx8,+fma,-rtm,+bmi,-enqcmd,+rdrnd,+mwaitx,+sse4.1,+sse4.2,+avx2,+fxsr,+wbnoinvd,+sse,+lzcnt,+pclmul,+rdpru,-avxifma,+f16c,+ssse3,-sgx,-prefetchwt1,+cmov,-avx512vbmi,-shstk,+movbe,-avx512vp2intersect,+xsaveopt,-avx512dq,+sse2,+adx,+sse3", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, targettarget<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "-avx512pf,-tsxldtrk,+cx16,+sahf,-tbm,-avx512ifma,+sha,+crc32,-fma4,-vpclmulqdq,+prfchw,+bmi2,-cldemote,+fsgsbase,-avx512bf16,-amx-tile,-raoint,-uintr,-gfni,+popcnt,-ptwrite,+aes,-avx512bitalg,-movdiri,-widekl,+xsaves,-avx512er,-avxvnni,-avx512fp16,-avx512vnni,-amx-bf16,-avxvnniint8,-avx512vpopcntdq,-pconfig,+clwb,-cmpccxadd,-avx512f,+xsavec,+clzero,-pku,-amx-fp16,+mmx,-lwp,+rdpid,-xop,+rdseed,-waitpkg,-prefetchi,-kl,-movdir64b,+sse4a,-avx512bw,-avxneconvert,+clflushopt,+xsave,-avx512vbmi2,+64bit,-avx512vl,-serialize,-hreset,-invpcid,-avx512cd,+avx,-vaes,-amx-int8,+cx8,+fma,-rtm,+bmi,-enqcmd,+rdrnd,+mwaitx,+sse4.1,+sse4.2,+avx2,+fxsr,+wbnoinvd,+sse,+lzcnt,+pclmul,+rdpru,-avxifma,+f16c,+ssse3,-sgx,-prefetchwt1,+cmov,-avx512vbmi,-shstk,+movbe,-avx512vp2intersect,+xsaveopt,-avx512dq,+sse2,+adx,+sse3", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-unknown-unknown-eabi-elf"}>
xyz.str:26:3: note: called from
xyz.str:3232:12: error: failed to serialize executables
xyz.str:26:3: note: called from
xyz.str:5024:13: error: 'builtin.module' op expected total size of stack allocation is not greater than 32768 bytes, but got 135168 bytes
xyz.str:26:3: note: called from
xyz.str:5024:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "-avx512pf,-tsxldtrk,+cx16,+sahf,-tbm,-avx512ifma,+sha,+crc32,-fma4,-vpclmulqdq,+prfchw,+bmi2,-cldemote,+fsgsbase,-avx512bf16,-amx-tile,-raoint,-uintr,-gfni,+popcnt,-ptwrite,+aes,-avx512bitalg,-movdiri,-widekl,+xsaves,-avx512er,-avxvnni,-avx512fp16,-avx512vnni,-amx-bf16,-avxvnniint8,-avx512vpopcntdq,-pconfig,+clwb,-cmpccxadd,-avx512f,+xsavec,+clzero,-pku,-amx-fp16,+mmx,-lwp,+rdpid,-xop,+rdseed,-waitpkg,-prefetchi,-kl,-movdir64b,+sse4a,-avx512bw,-avxneconvert,+clflushopt,+xsave,-avx512vbmi2,+64bit,-avx512vl,-serialize,-hreset,-invpcid,-avx512cd,+avx,-vaes,-amx-int8,+cx8,+fma,-rtm,+bmi,-enqcmd,+rdrnd,+mwaitx,+sse4.1,+sse4.2,+avx2,+fxsr,+wbnoinvd,+sse,+lzcnt,+pclmul,+rdpru,-avxifma,+f16c,+ssse3,-sgx,-prefetchwt1,+cmov,-avx512vbmi,-shstk,+movbe,-avx512vp2intersect,+xsaveopt,-avx512dq,+sse2,+adx,+sse3", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-unknown-unknown-eabi-elf"}>
xyz.str:26:3: note: called from
xyz.str:5024:13: error: failed to serialize executables
xyz.str:26:3: note: called from
Invoked with:
iree-compile.exe C:\g\shark\shark.venv\lib\site-packages\iree\compiler\tools\..\_mlir_libs\iree-compile.exe - --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvm-embedded-linker-path=C:\g\shark\shark.venv\lib\site-packages\iree\compiler\tools\..\_mlir_libs\iree-lld.exe --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvm-target-cpu-features=host -iree-llvm-target-triple=x86_64-pc-windows-msvc --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-util-zero-fill-elided-attrs --iree-flow-enable-padding-linalg-ops --iree-flow-linalg-ops-padding-size=32 --iree-flow-enable-conv-img2col-transform
Need more information? Set IREE_SAVE_TEMPS=/some/dir in your environment to save all artifacts and reproducers.
Steps to reproduce your issue
- Go to ‘…’
- Click on ‘…’
- Scroll down to ‘…’
- See error
What component(s) does this issue relate to?
Compiler
Version information
No response
Additional context
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 26 (21 by maintainers)
Commits related to this issue
- Enable PyTorch UNet CPU benchmark (#12697) Now that https://github.com/openxla/iree/issues/11447 is fixed, we can enable UNet CPU benchmarks. benchmarks: x86_64, cuda — committed to iree-org/iree by mariecwhite a year ago
- Enable PyTorch UNet CPU benchmark (#12697) Now that https://github.com/openxla/iree/issues/11447 is fixed, we can enable UNet CPU benchmarks. benchmarks: x86_64, cuda — committed to qedawkins/iree by mariecwhite a year ago
- Enable PyTorch UNet CPU benchmark (#12697) Now that https://github.com/openxla/iree/issues/11447 is fixed, we can enable UNet CPU benchmarks. benchmarks: x86_64, cuda — committed to iree-org/iree by mariecwhite a year ago
- Enable PyTorch UNet CPU benchmark (#12697) Now that https://github.com/openxla/iree/issues/11447 is fixed, we can enable UNet CPU benchmarks. benchmarks: x86_64, cuda — committed to NatashaKnk/iree by mariecwhite a year ago
A related article: https://futhark-lang.org/blog/2021-08-05-half-precision-floats.html