iree: Stack Overflow on ARM for dylib

Running MobileNet V3 UINT8 on Pixel 6, driver dylib, over a prolonged period results in a stack overflow. Note that dylib-sync works fine on the same model. MobileNet V3 float also works fine.

AddressSanitizerAddressSanitizer:DEADLYSIGNAL
:DEADLYSIGNAL
=================================================================
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
==7097==ERROR: AddressSanitizer: stack-overflow on address 0x00713c600000 (pc 0x00713f89ed90 bp 0x00713c6fb750 sp 0x00713c5fb670 T2)
AddressSanitizer:DEADLYSIGNAL
libunwind: Unsupported .eh_frame_hdr version
AddressSanitizer:DEADLYSIGNAL
    #0 0x713f89ed90  (<unknown module>)
    #1 0x7141b44a04  (/system/lib64/libclang_rt.asan-aarch64-android.so+0xb0a04)
    #2 0x7141b447c4  (/system/lib64/libclang_rt.asan-aarch64-android.so+0xb07c4)
    #3 0x7141b44040  (/system/lib64/libclang_rt.asan-aarch64-android.so+0xb0040)
    #4 0x7146d02878  ([vdso]+0x878)

    #0 0x7141b31588  (/system/lib64/libclang_rt.asan-aarch64-android.so+0x9d588)
    #1 0x6ea8ebba80  (/data/local/tmp/libireebackend_dbg.so+0x96a80)
    #2 0x6ea8eb9ab4  (/data/local/tmp/libireebackend_dbg.so+0x94ab4)
    #3 0x6ea8eb0a34  (/data/local/tmp/libireebackend_dbg.so+0x8ba34)
    #4 0x6ea8eb0458  (/data/local/tmp/libireebackend_dbg.so+0x8b458)
    #5 0x6ea8ea2238  (/data/local/tmp/libireebackend_dbg.so+0x7d238)
    #6 0x6ea8fadf3c  (/data/local/tmp/libireebackend_dbg.so+0x188f3c)
    #7 0x6ea8ea0a20  (/data/local/tmp/libireebackend_dbg.so+0x7ba20)
    #8 0x6ea8e97620  (/data/local/tmp/libireebackend_dbg.so+0x72620)
    #9 0x62930e5400  (/data/local/tmp/mlperf_main_dbg+0xbaf400)
    #10 0x6293080194  (/data/local/tmp/mlperf_main_dbg+0xb4a194)
    #11 0x6293083eb8  (/data/local/tmp/mlperf_main_dbg+0xb4deb8)
    #12 0x713ff135dc  (/apex/com.android.runtime/lib64/bionic/libc.so+0x485dc)
    #13 0x629307eb48  (/data/local/tmp/mlperf_main_dbg+0xb48b48)
    #14 0x7146d3c074  (/data/local/tmp/mlperf_main_dbg+0x39074)

==7097==ABORTING
Aborted

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 41 (33 by maintainers)

Commits related to this issue

Most upvoted comments

I checked this recently. Now all stack allocations are gone for this model.

So long as we can provably get to a point where we have a bounded size and a single alloca outside of loops and then add a verifier for that we’ll be ok. We will need to pick a reasonable limit on the size (32-256KB), which is ideally derived from processor local cache size (as soon as you have N processors with a shared cache all using all of that cache for scratch work you’re going to have the same problems we have in CUDA with using too much shared memory and limiting utilization). We’ll eventually want to tune that dynamically same way we’ll want to with GPUs but that’s not wired up yet.

I can repro it on desktop with ulimit -s 1024. yeah, there are alloca ops.

It crashes in dispatch_6 which contains conv ops. In LinalgBufferize, some alloca ops are generated. Let me take a look at what’s happening in conv op lowering.