iree: Stack Overflow on ARM for dylib
Running MobileNet V3 UINT8 on Pixel 6, driver dylib, over a prolonged period results in a stack overflow. Note that dylib-sync works fine on the same model. MobileNet V3 float also works fine.
AddressSanitizerAddressSanitizer:DEADLYSIGNAL
:DEADLYSIGNAL
=================================================================
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
==7097==ERROR: AddressSanitizer: stack-overflow on address 0x00713c600000 (pc 0x00713f89ed90 bp 0x00713c6fb750 sp 0x00713c5fb670 T2)
AddressSanitizer:DEADLYSIGNAL
libunwind: Unsupported .eh_frame_hdr version
AddressSanitizer:DEADLYSIGNAL
#0 0x713f89ed90 (<unknown module>)
#1 0x7141b44a04 (/system/lib64/libclang_rt.asan-aarch64-android.so+0xb0a04)
#2 0x7141b447c4 (/system/lib64/libclang_rt.asan-aarch64-android.so+0xb07c4)
#3 0x7141b44040 (/system/lib64/libclang_rt.asan-aarch64-android.so+0xb0040)
#4 0x7146d02878 ([vdso]+0x878)
#0 0x7141b31588 (/system/lib64/libclang_rt.asan-aarch64-android.so+0x9d588)
#1 0x6ea8ebba80 (/data/local/tmp/libireebackend_dbg.so+0x96a80)
#2 0x6ea8eb9ab4 (/data/local/tmp/libireebackend_dbg.so+0x94ab4)
#3 0x6ea8eb0a34 (/data/local/tmp/libireebackend_dbg.so+0x8ba34)
#4 0x6ea8eb0458 (/data/local/tmp/libireebackend_dbg.so+0x8b458)
#5 0x6ea8ea2238 (/data/local/tmp/libireebackend_dbg.so+0x7d238)
#6 0x6ea8fadf3c (/data/local/tmp/libireebackend_dbg.so+0x188f3c)
#7 0x6ea8ea0a20 (/data/local/tmp/libireebackend_dbg.so+0x7ba20)
#8 0x6ea8e97620 (/data/local/tmp/libireebackend_dbg.so+0x72620)
#9 0x62930e5400 (/data/local/tmp/mlperf_main_dbg+0xbaf400)
#10 0x6293080194 (/data/local/tmp/mlperf_main_dbg+0xb4a194)
#11 0x6293083eb8 (/data/local/tmp/mlperf_main_dbg+0xb4deb8)
#12 0x713ff135dc (/apex/com.android.runtime/lib64/bionic/libc.so+0x485dc)
#13 0x629307eb48 (/data/local/tmp/mlperf_main_dbg+0xb48b48)
#14 0x7146d3c074 (/data/local/tmp/mlperf_main_dbg+0x39074)
==7097==ABORTING
Aborted
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 41 (33 by maintainers)
Commits related to this issue
- Do not fuse with elementwise operations that cant bufferize in-place. Currently the backend cannot bufferize in-place dispatch regions that contain operations where the root operation like conv, etc.... — committed to MaheshRavishankar/iree by deleted user 2 years ago
- Do not fuse with elementwise operations that cant bufferize in-place. Currently the backend cannot bufferize in-place dispatch regions that contain operations where the root operation like conv, etc.... — committed to MaheshRavishankar/iree by deleted user 2 years ago
- Do not fuse with elementwise operations that cant bufferize in-place. Currently the backend cannot bufferize in-place dispatch regions that contain operations where the root operation like conv, etc.... — committed to MaheshRavishankar/iree by deleted user 2 years ago
- Do not fuse with elementwise operations that cant bufferize in-place. (#8526) Currently the backend cannot bufferize in-place dispatch regions that contain operations where the root operation like c... — committed to iree-org/iree by MaheshRavishankar 2 years ago
- Drop the elision of dead results from dispatch regions. The elision of dead results from dispatch regions causes stack allocation in certain cases. The reason is down to how some ops are represented ... — committed to MaheshRavishankar/iree by deleted user 2 years ago
- Drop the elision of dead results from dispatch regions. The elision of dead results from dispatch regions causes stack allocation in certain cases. The reason is down to how some ops are represented ... — committed to MaheshRavishankar/iree by deleted user 2 years ago
- Handle values dead outside dispatches in RewriteDestructiveUpdate. Current handling of destructive update rewrites does not handle cases like argmax, where the max value itself might not have uses ou... — committed to MaheshRavishankar/iree by deleted user 2 years ago
- Handle values dead outside dispatches in RewriteDestructiveUpdate. (#8960) Current handling of destructive update rewrites does not handle cases like argmax, where the max value itself might not hav... — committed to iree-org/iree by MaheshRavishankar 2 years ago
- Handle values dead outside dispatches in RewriteDestructiveUpdate. (#8960) Current handling of destructive update rewrites does not handle cases like argmax, where the max value itself might not hav... — committed to mariecwhite/iree by MaheshRavishankar 2 years ago
I checked this recently. Now all stack allocations are gone for this model.
So long as we can provably get to a point where we have a bounded size and a single alloca outside of loops and then add a verifier for that we’ll be ok. We will need to pick a reasonable limit on the size (32-256KB), which is ideally derived from processor local cache size (as soon as you have N processors with a shared cache all using all of that cache for scratch work you’re going to have the same problems we have in CUDA with using too much shared memory and limiting utilization). We’ll eventually want to tune that dynamically same way we’ll want to with GPUs but that’s not wired up yet.
I can repro it on desktop with
ulimit -s 1024
. yeah, there are alloca ops.It crashes in dispatch_6 which contains conv ops. In
LinalgBufferize
, some alloca ops are generated. Let me take a look at what’s happening in conv op lowering.