tensorflow: Metal delegate Crash with C++ interface

Hello, I am trying to compare performance of TFLite delegates on iOS devices. This issue is related to comments on 60c4c3e.

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Not really
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy): iPhone 6, iPhone SE
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): r2.3
Python version: 3.8.3
Bazel version (if compiling from source): 3.1.0
GCC/Compiler version (if compiling from source): Apple Clang version 12.0.0

Describe the current behavior

I build an iOS static framework which includes below code and link it with camera demo application (iOS) to run image classification task.

/* Same with tensorflowlite internal usage */
using TfLiteDelegatePtr = std::unique_ptr<TfLiteDelegate, void (*)(TfLiteDelegate*)>;

/* for Metal delegate */
TFLGpuDelegateOptions gpu_opts = {};
gpu_opts.allow_precision_loss = true;
gpu_opts.enable_quantization = true;
gpu_delegate = TfLiteDelegatePtr(TFLGpuDelegateCreate(&gpu_opts), TFLGpuDelegateDelete);

/* Error occur in `ModifyGraphWithDelegate()` */
if (interpreter_->ModifyGraphWithDelegate(std::move(gpu_delegate)) != kTfLiteOk) {
    LOGE(TAG, "%s. failed to delegate to GPU, fallback to CPU", __func__);
    return;
}

Then run tflite interpreter with full-integer quantized TFLite model.

On Xcode, interpreter_->ModifyGraphWithDelegate() issues EXC_BAD_ACCESS (code=1, address=0x11261ca70) (address varies after runs)
XNNPack delegate does not break on same quantized model.
GPU (Metal) delegate does not break on float16 model.

Describe the expected behavior

interpreter_->ModifyGraphWithDelegate() function (with Metal delegate on full-integer quantized model) does not crash and at least return an error code. (!= kTfLiteOK)

XCode Logs ps. HyperFast is the name of the iOS static framework. To build the framework, I defined custom bazel rule to build static library (.a) (includes tflite runtime + xnnpack delegate + metal delegate + coreml delegate) with C++ interfaces. And then build framework with that static library. So xcode does not know about TFLite source code.

HyperFast`tflite::delegates::CreateNewTensorWithDifferentType:
    0x1043ff408 <+0>:   stp    x24, x23, [sp, #-0x40]!
    0x1043ff40c <+4>:   stp    x22, x21, [sp, #0x10]
    0x1043ff410 <+8>:   stp    x20, x19, [sp, #0x20]
    0x1043ff414 <+12>:  stp    x29, x30, [sp, #0x30]
    0x1043ff418 <+16>:  add    x29, sp, #0x30            ; =0x30 
    0x1043ff41c <+20>:  mov    x23, x4
    0x1043ff420 <+24>:  mov    x20, x3
    0x1043ff424 <+28>:  mov    x22, x2
    0x1043ff428 <+32>:  mov    x21, x1
    0x1043ff42c <+36>:  mov    x19, x0
    0x1043ff430 <+40>:  ldr    x24, [x0, #0x10]
    0x1043ff434 <+44>:  ldr    x8, [x0, #0x30]
    0x1043ff438 <+48>:  mov    w1, #0x1
    0x1043ff43c <+52>:  mov    x2, x4
    0x1043ff440 <+56>:  blr    x8
    0x1043ff444 <+60>:  cbnz   w0, 0x1043ff4dc           ; <+212>
    0x1043ff448 <+64>:  ldr    x8, [x19, #0x10]
    0x1043ff44c <+68>:  ldrsw  x9, [x23]
    0x1043ff450 <+72>:  mov    w10, #0x70
    0x1043ff454 <+76>:  madd   x8, x9, x10, x8
    0x1043ff458 <+80>:  str    x8, [x20]
    0x1043ff45c <+84>:  str    w22, [x8]
    0x1043ff460 <+88>:  mov    w9, #0x2
    0x1043ff464 <+92>:  str    w9, [x8, #0x20]
    0x1043ff468 <+96>:  smaddl x8, w21, w10, x24
->  0x1043ff46c <+100>: ldr    x21, [x8, #0x10]
    0x1043ff470 <+104>: ldr    w0, [x21]
    0x1043ff474 <+108>: bl     0x1044100fc               ; TfLiteIntArrayCreate
    0x1043ff478 <+112>: mov    x2, x0
    0x1043ff47c <+116>: ldr    w8, [x21]
    0x1043ff480 <+120>: cmp    w8, #0x1                  ; =0x1 
    0x1043ff484 <+124>: b.lt   0x1043ff4b0               ; <+168>
    0x1043ff488 <+128>: mov    x8, #0x0
    0x1043ff48c <+132>: add    x9, x2, #0x4              ; =0x4 
    0x1043ff490 <+136>: add    x10, x21, #0x4            ; =0x4 
    0x1043ff494 <+140>: lsl    x11, x8, #2
    0x1043ff498 <+144>: ldr    w12, [x10, x11]
    0x1043ff49c <+148>: str    w12, [x9, x11]
    0x1043ff4a0 <+152>: add    x8, x8, #0x1              ; =0x1 
    0x1043ff4a4 <+156>: ldrsw  x11, [x21]
    0x1043ff4a8 <+160>: cmp    x8, x11
    0x1043ff4ac <+164>: b.lt   0x1043ff494               ; <+140>
    0x1043ff4b0 <+168>: ldr    x8, [x19, #0x20]
    0x1043ff4b4 <+172>: ldr    x1, [x20]
    0x1043ff4b8 <+176>: mov    x0, x19
    0x1043ff4bc <+180>: blr    x8
    0x1043ff4c0 <+184>: cbz    w0, 0x1043ff4dc           ; <+212>
    0x1043ff4c4 <+188>: ldr    x8, [x19, #0x28]
    0x1043ff4c8 <+192>: adr    x1, #0xa58e9              ; "Could not resize new delegate tensor"
    0x1043ff4cc <+196>: nop    
    0x1043ff4d0 <+200>: mov    x0, x19
    0x1043ff4d4 <+204>: blr    x8
    0x1043ff4d8 <+208>: mov    w0, #0x1
    0x1043ff4dc <+212>: ldp    x29, x30, [sp, #0x30]
    0x1043ff4e0 <+216>: ldp    x20, x19, [sp, #0x20]
    0x1043ff4e4 <+220>: ldp    x22, x21, [sp, #0x10]
    0x1043ff4e8 <+224>: ldp    x24, x23, [sp], #0x40
    0x1043ff4ec <+228>: ret

HyperFast`tflite::impl::Interpreter::ModifyGraphWithDelegate:
    0x1043f1a6c <+0>:   stp    x24, x23, [sp, #-0x40]!
    0x1043f1a70 <+4>:   stp    x22, x21, [sp, #0x10]
    0x1043f1a74 <+8>:   stp    x20, x19, [sp, #0x20]
    0x1043f1a78 <+12>:  stp    x29, x30, [sp, #0x30]
    0x1043f1a7c <+16>:  add    x29, sp, #0x30            ; =0x30 
    0x1043f1a80 <+20>:  mov    x20, x1
    0x1043f1a84 <+24>:  mov    x19, x0
    0x1043f1a88 <+28>:  ldp    x8, x9, [x0, #0x18]
    0x1043f1a8c <+32>:  cmp    x8, x9
    0x1043f1a90 <+36>:  b.hs   0x1043f1ab4               ; <+72>
    0x1043f1a94 <+40>:  ldr    x9, [x20]
    0x1043f1a98 <+44>:  str    xzr, [x20]
    0x1043f1a9c <+48>:  str    x9, [x8]
    0x1043f1aa0 <+52>:  ldr    x9, [x20, #0x8]
    0x1043f1aa4 <+56>:  str    x9, [x8, #0x8]
    0x1043f1aa8 <+60>:  add    x8, x8, #0x10             ; =0x10 
    0x1043f1aac <+64>:  str    x8, [x19, #0x18]
    0x1043f1ab0 <+68>:  b      0x1043f1bb4               ; <+328>
    0x1043f1ab4 <+72>:  add    x0, x19, #0x10            ; =0x10 
    0x1043f1ab8 <+76>:  ldr    x10, [x0]
    0x1043f1abc <+80>:  sub    x8, x8, x10
    0x1043f1ac0 <+84>:  asr    x21, x8, #4
    0x1043f1ac4 <+88>:  add    x8, x21, #0x1             ; =0x1 
    0x1043f1ac8 <+92>:  lsr    x11, x8, #60
    0x1043f1acc <+96>:  cbnz   x11, 0x1043f1c34          ; <+456>
    0x1043f1ad0 <+100>: sub    x9, x9, x10
    0x1043f1ad4 <+104>: asr    x10, x9, #3
    0x1043f1ad8 <+108>: cmp    x10, x8
    0x1043f1adc <+112>: csel   x8, x8, x10, lo
    0x1043f1ae0 <+116>: mov    x10, #0x7ffffffffffffff
    0x1043f1ae4 <+120>: cmp    x10, x9, asr #4
    0x1043f1ae8 <+124>: mov    x9, #0xfffffffffffffff
    0x1043f1aec <+128>: csel   x22, x8, x9, hi
    0x1043f1af0 <+132>: cbz    x22, 0x1043f1b08          ; <+156>
    0x1043f1af4 <+136>: lsr    x8, x22, #60
    0x1043f1af8 <+140>: cbnz   x8, 0x1043f1c38           ; <+460>
    0x1043f1afc <+144>: lsl    x0, x22, #4
    0x1043f1b00 <+148>: bl     0x10444f960               ; symbol stub for: operator new(unsigned long)
    0x1043f1b04 <+152>: b      0x1043f1b0c               ; <+160>
    0x1043f1b08 <+156>: mov    x0, #0x0
    0x1043f1b0c <+160>: add    x10, x0, x21, lsl #4
    0x1043f1b10 <+164>: add    x9, x0, x22, lsl #4
    0x1043f1b14 <+168>: ldr    q0, [x20]
    0x1043f1b18 <+172>: str    xzr, [x20]
    0x1043f1b1c <+176>: mov    x11, x10
    0x1043f1b20 <+180>: str    q0, [x11], #0x10
    0x1043f1b24 <+184>: ldp    x8, x12, [x19, #0x10]
    0x1043f1b28 <+188>: cmp    x12, x8
    0x1043f1b2c <+192>: b.eq   0x1043f1b68               ; <+252>
    0x1043f1b30 <+196>: ldr    x13, [x12, #-0x10]!
    0x1043f1b34 <+200>: str    xzr, [x12]
    0x1043f1b38 <+204>: stur   x13, [x10, #-0x10]
    0x1043f1b3c <+208>: ldr    x13, [x12, #0x8]
    0x1043f1b40 <+212>: stur   x13, [x10, #-0x8]
    0x1043f1b44 <+216>: sub    x10, x10, #0x10           ; =0x10 
    0x1043f1b48 <+220>: cmp    x8, x12
    0x1043f1b4c <+224>: b.ne   0x1043f1b30               ; <+196>
    0x1043f1b50 <+228>: ldp    x20, x8, [x19, #0x10]
    0x1043f1b54 <+232>: stp    x10, x11, [x19, #0x10]
    0x1043f1b58 <+236>: str    x9, [x19, #0x20]
    0x1043f1b5c <+240>: cmp    x8, x20
    0x1043f1b60 <+244>: b.ne   0x1043f1b7c               ; <+272>
    0x1043f1b64 <+248>: b      0x1043f1ba8               ; <+316>
    0x1043f1b68 <+252>: mov    x20, x8
    0x1043f1b6c <+256>: stp    x10, x11, [x19, #0x10]
    0x1043f1b70 <+260>: str    x9, [x19, #0x20]
    0x1043f1b74 <+264>: cmp    x8, x20
    0x1043f1b78 <+268>: b.eq   0x1043f1ba8               ; <+316>
    0x1043f1b7c <+272>: mov    x21, x8
    0x1043f1b80 <+276>: b      0x1043f1b90               ; <+292>
    0x1043f1b84 <+280>: mov    x8, x21
    0x1043f1b88 <+284>: cmp    x20, x21
    0x1043f1b8c <+288>: b.eq   0x1043f1ba8               ; <+316>
    0x1043f1b90 <+292>: ldr    x0, [x21, #-0x10]!
    0x1043f1b94 <+296>: str    xzr, [x21]
    0x1043f1b98 <+300>: cbz    x0, 0x1043f1b84           ; <+280>
    0x1043f1b9c <+304>: ldur   x8, [x8, #-0x8]
    0x1043f1ba0 <+308>: blr    x8
    0x1043f1ba4 <+312>: b      0x1043f1b84               ; <+280>
    0x1043f1ba8 <+316>: cbz    x20, 0x1043f1bb4          ; <+328>
    0x1043f1bac <+320>: mov    x0, x20
    0x1043f1bb0 <+324>: bl     0x10444f948               ; symbol stub for: operator delete(void*)
    0x1043f1bb4 <+328>: ldp    x21, x8, [x19, #0x68]
    0x1043f1bb8 <+332>: cmp    x21, x8
    0x1043f1bbc <+336>: b.eq   0x1043f1c1c               ; <+432>
    0x1043f1bc0 <+340>: ldr    x9, [x19, #0x18]
    0x1043f1bc4 <+344>: ldur   x20, [x9, #-0x10]
    0x1043f1bc8 <+348>: sub    x22, x8, #0x8             ; =0x8 
    0x1043f1bcc <+352>: mov    x23, x21
    0x1043f1bd0 <+356>: ldr    x0, [x23], #0x8
    0x1043f1bd4 <+360>: mov    x1, x20
    0x1043f1bd8 <+364>: bl     0x1043ef2d0               ; tflite::impl::Subgraph::ModifyGraphWithDelegate(TfLiteDelegate*)
->  0x1043f1bdc <+368>: cmp    x22, x21
    0x1043f1be0 <+372>: b.eq   0x1043f1bec               ; <+384>
    0x1043f1be4 <+376>: mov    x21, x23
    0x1043f1be8 <+380>: cbz    w0, 0x1043f1bd0           ; <+356>
    0x1043f1bec <+384>: cmp    w0, #0x2                  ; =0x2 
    0x1043f1bf0 <+388>: b.ne   0x1043f1c20               ; <+436>
    0x1043f1bf4 <+392>: ldp    x20, x19, [x19, #0x68]
    0x1043f1bf8 <+396>: cmp    x20, x19
    0x1043f1bfc <+400>: b.eq   0x1043f1c14               ; <+424>
    0x1043f1c00 <+404>: ldr    x0, [x20], #0x8
    0x1043f1c04 <+408>: bl     0x1043f01c8               ; tflite::impl::Subgraph::RemoveAllDelegates()
    0x1043f1c08 <+412>: cbnz   w0, 0x1043f1c20           ; <+436>
    0x1043f1c0c <+416>: cmp    x19, x20
    0x1043f1c10 <+420>: b.ne   0x1043f1c00               ; <+404>
    0x1043f1c14 <+424>: mov    w0, #0x2
    0x1043f1c18 <+428>: b      0x1043f1c20               ; <+436>
    0x1043f1c1c <+432>: mov    w0, #0x0
    0x1043f1c20 <+436>: ldp    x29, x30, [sp, #0x30]
    0x1043f1c24 <+440>: ldp    x20, x19, [sp, #0x20]
    0x1043f1c28 <+444>: ldp    x22, x21, [sp, #0x10]
    0x1043f1c2c <+448>: ldp    x24, x23, [sp], #0x40
    0x1043f1c30 <+452>: ret    
    0x1043f1c34 <+456>: bl     0x10444f498               ; symbol stub for: std::__1::__vector_base_common<true>::__throw_length_error() const
    0x1043f1c38 <+460>: bl     0x1043f2a0c               ; std::__1::__throw_length_error(char const*)

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 20 (7 by maintainers)

Most upvoted comments

My bad! Thanks for catching. Will work on a fix. However, it’s not likely for the fix to go into r2.3 as it’s not a critical security fix, and quantization support in r2.3 is somewhat experimental.

For the new bug, the error is coming from batch size check in here https://github.com/tensorflow/tensorflow/blob/6bdae6145a521693aba42eff7f3c8b070429c05b/tensorflow/lite/delegates/gpu/common/model.cc#L503

Can you check that the input/output tensors of the convolution has [1xNxMxK] shape?

teijeong on Oct 17, 2020