tensorflow: Metal delegate Crash with C++ interface
Hello, I am trying to compare performance of TFLite delegates on iOS devices. This issue is related to comments on 60c4c3e.
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Not really
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy): iPhone 6, iPhone SE
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): r2.3
- Python version: 3.8.3
- Bazel version (if compiling from source): 3.1.0
- GCC/Compiler version (if compiling from source): Apple Clang version 12.0.0
Describe the current behavior
I build an iOS static framework which includes below code and link it with camera demo application (iOS) to run image classification task.
/* Same with tensorflowlite internal usage */
using TfLiteDelegatePtr = std::unique_ptr<TfLiteDelegate, void (*)(TfLiteDelegate*)>;
/* for Metal delegate */
TFLGpuDelegateOptions gpu_opts = {};
gpu_opts.allow_precision_loss = true;
gpu_opts.enable_quantization = true;
gpu_delegate = TfLiteDelegatePtr(TFLGpuDelegateCreate(&gpu_opts), TFLGpuDelegateDelete);
/* Error occur in `ModifyGraphWithDelegate()` */
if (interpreter_->ModifyGraphWithDelegate(std::move(gpu_delegate)) != kTfLiteOk) {
LOGE(TAG, "%s. failed to delegate to GPU, fallback to CPU", __func__);
return;
}
Then run tflite interpreter with full-integer quantized TFLite model.
- On Xcode,
interpreter_->ModifyGraphWithDelegate()issuesEXC_BAD_ACCESS (code=1, address=0x11261ca70)(address varies after runs) - XNNPack delegate does not break on same quantized model.
- GPU (Metal) delegate does not break on float16 model.
Describe the expected behavior
interpreter_->ModifyGraphWithDelegate() function (with Metal delegate on full-integer quantized model) does not crash and at least return an error code. (!= kTfLiteOK)
XCode Logs
ps. HyperFast is the name of the iOS static framework.
To build the framework, I defined custom bazel rule to build static library (.a) (includes tflite runtime + xnnpack delegate + metal delegate + coreml delegate) with C++ interfaces.
And then build framework with that static library. So xcode does not know about TFLite source code.
HyperFast`tflite::delegates::CreateNewTensorWithDifferentType:
0x1043ff408 <+0>: stp x24, x23, [sp, #-0x40]!
0x1043ff40c <+4>: stp x22, x21, [sp, #0x10]
0x1043ff410 <+8>: stp x20, x19, [sp, #0x20]
0x1043ff414 <+12>: stp x29, x30, [sp, #0x30]
0x1043ff418 <+16>: add x29, sp, #0x30 ; =0x30
0x1043ff41c <+20>: mov x23, x4
0x1043ff420 <+24>: mov x20, x3
0x1043ff424 <+28>: mov x22, x2
0x1043ff428 <+32>: mov x21, x1
0x1043ff42c <+36>: mov x19, x0
0x1043ff430 <+40>: ldr x24, [x0, #0x10]
0x1043ff434 <+44>: ldr x8, [x0, #0x30]
0x1043ff438 <+48>: mov w1, #0x1
0x1043ff43c <+52>: mov x2, x4
0x1043ff440 <+56>: blr x8
0x1043ff444 <+60>: cbnz w0, 0x1043ff4dc ; <+212>
0x1043ff448 <+64>: ldr x8, [x19, #0x10]
0x1043ff44c <+68>: ldrsw x9, [x23]
0x1043ff450 <+72>: mov w10, #0x70
0x1043ff454 <+76>: madd x8, x9, x10, x8
0x1043ff458 <+80>: str x8, [x20]
0x1043ff45c <+84>: str w22, [x8]
0x1043ff460 <+88>: mov w9, #0x2
0x1043ff464 <+92>: str w9, [x8, #0x20]
0x1043ff468 <+96>: smaddl x8, w21, w10, x24
-> 0x1043ff46c <+100>: ldr x21, [x8, #0x10]
0x1043ff470 <+104>: ldr w0, [x21]
0x1043ff474 <+108>: bl 0x1044100fc ; TfLiteIntArrayCreate
0x1043ff478 <+112>: mov x2, x0
0x1043ff47c <+116>: ldr w8, [x21]
0x1043ff480 <+120>: cmp w8, #0x1 ; =0x1
0x1043ff484 <+124>: b.lt 0x1043ff4b0 ; <+168>
0x1043ff488 <+128>: mov x8, #0x0
0x1043ff48c <+132>: add x9, x2, #0x4 ; =0x4
0x1043ff490 <+136>: add x10, x21, #0x4 ; =0x4
0x1043ff494 <+140>: lsl x11, x8, #2
0x1043ff498 <+144>: ldr w12, [x10, x11]
0x1043ff49c <+148>: str w12, [x9, x11]
0x1043ff4a0 <+152>: add x8, x8, #0x1 ; =0x1
0x1043ff4a4 <+156>: ldrsw x11, [x21]
0x1043ff4a8 <+160>: cmp x8, x11
0x1043ff4ac <+164>: b.lt 0x1043ff494 ; <+140>
0x1043ff4b0 <+168>: ldr x8, [x19, #0x20]
0x1043ff4b4 <+172>: ldr x1, [x20]
0x1043ff4b8 <+176>: mov x0, x19
0x1043ff4bc <+180>: blr x8
0x1043ff4c0 <+184>: cbz w0, 0x1043ff4dc ; <+212>
0x1043ff4c4 <+188>: ldr x8, [x19, #0x28]
0x1043ff4c8 <+192>: adr x1, #0xa58e9 ; "Could not resize new delegate tensor"
0x1043ff4cc <+196>: nop
0x1043ff4d0 <+200>: mov x0, x19
0x1043ff4d4 <+204>: blr x8
0x1043ff4d8 <+208>: mov w0, #0x1
0x1043ff4dc <+212>: ldp x29, x30, [sp, #0x30]
0x1043ff4e0 <+216>: ldp x20, x19, [sp, #0x20]
0x1043ff4e4 <+220>: ldp x22, x21, [sp, #0x10]
0x1043ff4e8 <+224>: ldp x24, x23, [sp], #0x40
0x1043ff4ec <+228>: ret
HyperFast`tflite::impl::Interpreter::ModifyGraphWithDelegate:
0x1043f1a6c <+0>: stp x24, x23, [sp, #-0x40]!
0x1043f1a70 <+4>: stp x22, x21, [sp, #0x10]
0x1043f1a74 <+8>: stp x20, x19, [sp, #0x20]
0x1043f1a78 <+12>: stp x29, x30, [sp, #0x30]
0x1043f1a7c <+16>: add x29, sp, #0x30 ; =0x30
0x1043f1a80 <+20>: mov x20, x1
0x1043f1a84 <+24>: mov x19, x0
0x1043f1a88 <+28>: ldp x8, x9, [x0, #0x18]
0x1043f1a8c <+32>: cmp x8, x9
0x1043f1a90 <+36>: b.hs 0x1043f1ab4 ; <+72>
0x1043f1a94 <+40>: ldr x9, [x20]
0x1043f1a98 <+44>: str xzr, [x20]
0x1043f1a9c <+48>: str x9, [x8]
0x1043f1aa0 <+52>: ldr x9, [x20, #0x8]
0x1043f1aa4 <+56>: str x9, [x8, #0x8]
0x1043f1aa8 <+60>: add x8, x8, #0x10 ; =0x10
0x1043f1aac <+64>: str x8, [x19, #0x18]
0x1043f1ab0 <+68>: b 0x1043f1bb4 ; <+328>
0x1043f1ab4 <+72>: add x0, x19, #0x10 ; =0x10
0x1043f1ab8 <+76>: ldr x10, [x0]
0x1043f1abc <+80>: sub x8, x8, x10
0x1043f1ac0 <+84>: asr x21, x8, #4
0x1043f1ac4 <+88>: add x8, x21, #0x1 ; =0x1
0x1043f1ac8 <+92>: lsr x11, x8, #60
0x1043f1acc <+96>: cbnz x11, 0x1043f1c34 ; <+456>
0x1043f1ad0 <+100>: sub x9, x9, x10
0x1043f1ad4 <+104>: asr x10, x9, #3
0x1043f1ad8 <+108>: cmp x10, x8
0x1043f1adc <+112>: csel x8, x8, x10, lo
0x1043f1ae0 <+116>: mov x10, #0x7ffffffffffffff
0x1043f1ae4 <+120>: cmp x10, x9, asr #4
0x1043f1ae8 <+124>: mov x9, #0xfffffffffffffff
0x1043f1aec <+128>: csel x22, x8, x9, hi
0x1043f1af0 <+132>: cbz x22, 0x1043f1b08 ; <+156>
0x1043f1af4 <+136>: lsr x8, x22, #60
0x1043f1af8 <+140>: cbnz x8, 0x1043f1c38 ; <+460>
0x1043f1afc <+144>: lsl x0, x22, #4
0x1043f1b00 <+148>: bl 0x10444f960 ; symbol stub for: operator new(unsigned long)
0x1043f1b04 <+152>: b 0x1043f1b0c ; <+160>
0x1043f1b08 <+156>: mov x0, #0x0
0x1043f1b0c <+160>: add x10, x0, x21, lsl #4
0x1043f1b10 <+164>: add x9, x0, x22, lsl #4
0x1043f1b14 <+168>: ldr q0, [x20]
0x1043f1b18 <+172>: str xzr, [x20]
0x1043f1b1c <+176>: mov x11, x10
0x1043f1b20 <+180>: str q0, [x11], #0x10
0x1043f1b24 <+184>: ldp x8, x12, [x19, #0x10]
0x1043f1b28 <+188>: cmp x12, x8
0x1043f1b2c <+192>: b.eq 0x1043f1b68 ; <+252>
0x1043f1b30 <+196>: ldr x13, [x12, #-0x10]!
0x1043f1b34 <+200>: str xzr, [x12]
0x1043f1b38 <+204>: stur x13, [x10, #-0x10]
0x1043f1b3c <+208>: ldr x13, [x12, #0x8]
0x1043f1b40 <+212>: stur x13, [x10, #-0x8]
0x1043f1b44 <+216>: sub x10, x10, #0x10 ; =0x10
0x1043f1b48 <+220>: cmp x8, x12
0x1043f1b4c <+224>: b.ne 0x1043f1b30 ; <+196>
0x1043f1b50 <+228>: ldp x20, x8, [x19, #0x10]
0x1043f1b54 <+232>: stp x10, x11, [x19, #0x10]
0x1043f1b58 <+236>: str x9, [x19, #0x20]
0x1043f1b5c <+240>: cmp x8, x20
0x1043f1b60 <+244>: b.ne 0x1043f1b7c ; <+272>
0x1043f1b64 <+248>: b 0x1043f1ba8 ; <+316>
0x1043f1b68 <+252>: mov x20, x8
0x1043f1b6c <+256>: stp x10, x11, [x19, #0x10]
0x1043f1b70 <+260>: str x9, [x19, #0x20]
0x1043f1b74 <+264>: cmp x8, x20
0x1043f1b78 <+268>: b.eq 0x1043f1ba8 ; <+316>
0x1043f1b7c <+272>: mov x21, x8
0x1043f1b80 <+276>: b 0x1043f1b90 ; <+292>
0x1043f1b84 <+280>: mov x8, x21
0x1043f1b88 <+284>: cmp x20, x21
0x1043f1b8c <+288>: b.eq 0x1043f1ba8 ; <+316>
0x1043f1b90 <+292>: ldr x0, [x21, #-0x10]!
0x1043f1b94 <+296>: str xzr, [x21]
0x1043f1b98 <+300>: cbz x0, 0x1043f1b84 ; <+280>
0x1043f1b9c <+304>: ldur x8, [x8, #-0x8]
0x1043f1ba0 <+308>: blr x8
0x1043f1ba4 <+312>: b 0x1043f1b84 ; <+280>
0x1043f1ba8 <+316>: cbz x20, 0x1043f1bb4 ; <+328>
0x1043f1bac <+320>: mov x0, x20
0x1043f1bb0 <+324>: bl 0x10444f948 ; symbol stub for: operator delete(void*)
0x1043f1bb4 <+328>: ldp x21, x8, [x19, #0x68]
0x1043f1bb8 <+332>: cmp x21, x8
0x1043f1bbc <+336>: b.eq 0x1043f1c1c ; <+432>
0x1043f1bc0 <+340>: ldr x9, [x19, #0x18]
0x1043f1bc4 <+344>: ldur x20, [x9, #-0x10]
0x1043f1bc8 <+348>: sub x22, x8, #0x8 ; =0x8
0x1043f1bcc <+352>: mov x23, x21
0x1043f1bd0 <+356>: ldr x0, [x23], #0x8
0x1043f1bd4 <+360>: mov x1, x20
0x1043f1bd8 <+364>: bl 0x1043ef2d0 ; tflite::impl::Subgraph::ModifyGraphWithDelegate(TfLiteDelegate*)
-> 0x1043f1bdc <+368>: cmp x22, x21
0x1043f1be0 <+372>: b.eq 0x1043f1bec ; <+384>
0x1043f1be4 <+376>: mov x21, x23
0x1043f1be8 <+380>: cbz w0, 0x1043f1bd0 ; <+356>
0x1043f1bec <+384>: cmp w0, #0x2 ; =0x2
0x1043f1bf0 <+388>: b.ne 0x1043f1c20 ; <+436>
0x1043f1bf4 <+392>: ldp x20, x19, [x19, #0x68]
0x1043f1bf8 <+396>: cmp x20, x19
0x1043f1bfc <+400>: b.eq 0x1043f1c14 ; <+424>
0x1043f1c00 <+404>: ldr x0, [x20], #0x8
0x1043f1c04 <+408>: bl 0x1043f01c8 ; tflite::impl::Subgraph::RemoveAllDelegates()
0x1043f1c08 <+412>: cbnz w0, 0x1043f1c20 ; <+436>
0x1043f1c0c <+416>: cmp x19, x20
0x1043f1c10 <+420>: b.ne 0x1043f1c00 ; <+404>
0x1043f1c14 <+424>: mov w0, #0x2
0x1043f1c18 <+428>: b 0x1043f1c20 ; <+436>
0x1043f1c1c <+432>: mov w0, #0x0
0x1043f1c20 <+436>: ldp x29, x30, [sp, #0x30]
0x1043f1c24 <+440>: ldp x20, x19, [sp, #0x20]
0x1043f1c28 <+444>: ldp x22, x21, [sp, #0x10]
0x1043f1c2c <+448>: ldp x24, x23, [sp], #0x40
0x1043f1c30 <+452>: ret
0x1043f1c34 <+456>: bl 0x10444f498 ; symbol stub for: std::__1::__vector_base_common<true>::__throw_length_error() const
0x1043f1c38 <+460>: bl 0x1043f2a0c ; std::__1::__throw_length_error(char const*)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 20 (7 by maintainers)
My bad! Thanks for catching. Will work on a fix. However, it’s not likely for the fix to go into r2.3 as it’s not a critical security fix, and quantization support in r2.3 is somewhat experimental.
For the new bug, the error is coming from batch size check in here https://github.com/tensorflow/tensorflow/blob/6bdae6145a521693aba42eff7f3c8b070429c05b/tensorflow/lite/delegates/gpu/common/model.cc#L503
Can you check that the input/output tensors of the convolution has [1xNxMxK] shape?