openvino: [Bug] compile_model() Runtime crash in opencl

System information (version)
  • OpenVINO => openvino_2022.1.0.643
  • Operating System / Platform => Ubuntu 20.04 64 Bit
  • Compiler => pre compiled apt package Ubuntu 20.04
  • Problem classification: Runtime Crash
  • Framework: TensorFlow (if applicable)
Detailed description

We would like to switch to the new OpenVINO™ API 2.0 from our old implementation. We have an issue with the compile_model() for GPU target. It seems like a timing issue since its not breaking every time when we would like to load the model but every 40% of the time it crashes.

Steps to reproduce
ov::Core core;
std::shared_ptr<ov::Model> model;
std::string target = "GPU";
... (read model) ...
ov::set_batch(model, ov::Dimension(1, 16));
compiledModel = core.compile_model(model, target); <-- Crash

Based on the attached image it seems like an libopencl-clang.so.11.

Operating system: Linux 5.15.0 -52-generic #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022 x86_64 CPU: amd64 family 6 model 151 stepping 2 20 CPUs

Crash reason: SIGSEGV / SEGV_MAPERR Crash address: 0x48 Process uptime: 35 seconds

Thread 176 (crashed)
 0  libopencl-clang.so.11!llvm::BasicBlock::dropAllReferences() + 0x0
     rax = 0x00007fd07c5f9720    rdx = 0x00007fd07d6bf440
     rcx = 0x00007fd3a39f2bf6    rbx = 0x0000000000000030
     rsi = 0x0000000000000081    rdi = 0x0000000000000018
     rbp = 0x00007fd07c5f9560    rsp = 0x00007fd05fffc788
      r8 = 0x00007fd07c8479d8     r9 = 0x0000000000001fff
     r10 = 0x0000000000000000    r11 = 0x0000000000000246
     r12 = 0x00007fd07c5f95a8    r13 = 0x00007fd07c9be890
     r14 = 0x00007fd07c5f9560    r15 = 0x00007fd05fffc830
     rip = 0x00007fd2dded6770
    Found by: given as instruction pointer in context
 1  libopencl-clang.so.11!llvm::Function::dropAllReferences() + 0x2d
     rbx = 0x0000000000000030    rbp = 0x00007fd07c5f9560
     rsp = 0x00007fd05fffc790    r12 = 0x00007fd07c5f95a8
     r13 = 0x00007fd07c9be890    r14 = 0x00007fd07c5f9560
     r15 = 0x00007fd05fffc830    rip = 0x00007fd2ddf4667e
    Found by: call frame info
 2  libopencl-clang.so.11!llvm::Function::~Function() + 0xf
     rbx = 0x00007fd07c5f9560    rbp = 0x00007fd07c745150
     rsp = 0x00007fd05fffc7b0    r12 = 0x00007fd07c8479b0
     r13 = 0x00007fd07c9be890    r14 = 0x00007fd07c5f9560
     r15 = 0x00007fd05fffc830    rip = 0x00007fd2ddf4ded0
    Found by: call frame info
 3  libopencl-clang.so.11!llvm::Function::eraseFromParent() + 0x46
     rbx = 0x00007fd07c5f9560    rbp = 0x00007fd07c745150
     rsp = 0x00007fd05fffc7e0    r12 = 0x00007fd07c8479b0
     r13 = 0x00007fd07c9be890    r14 = 0x00007fd07d9f3e50
     r15 = 0x00007fd05fffc830    rip = 0x00007fd2ddf4dfd7
    Found by: call frame info
 4  libopencl-clang.so.11!SPIRV::OCLToSPIRVBase::runOCLToSPIRV(llvm::Module&) + 0x206
     rbx = 0x00007fd07c8479d8    rbp = 0x00007fd07c745150
     rsp = 0x00007fd05fffc800    r12 = 0x00007fd07c8479b0
     r13 = 0x00007fd07c9be890    r14 = 0x00007fd07d9f3e50
     r15 = 0x00007fd05fffc830    rip = 0x00007fd2dbf14547
    Found by: call frame info
 5  libopencl-clang.so.11!llvm::legacy::PassManagerImpl::run(llvm::Module&) + 0x406
     rbx = 0x00007fd05fffc8d0    rbp = 0x00007fd05fffc950
     rsp = 0x00007fd05fffc8a0    r12 = 0x00007fd07cb06f00
     r13 = 0x00007fd07c8937f0    r14 = 0x00007fd07c847a08
     r15 = 0x00007fd07c893810    rip = 0x00007fd2ddf99737
    Found by: call frame info
 6  libopencl-clang.so.11!llvm::writeSpirv(llvm::Module*, SPIRV::TranslatorOpts const&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) + 0xd1
     rbx = 0x00007fd07d698d30    rbp = 0x0000000000000001
     rsp = 0x00007fd05fffc960    r12 = 0x00007fd05fffc970
     r13 = 0x00007fd07cb06f00    r14 = 0x00007fd05fffcbe0
     r15 = 0x00007fd05fffcbc0    rip = 0x00007fd2dbe5b682
    Found by: call frame info
 7  libopencl-clang.so.11!Compile + 0x24b2
     rbx = 0x00007fd05fffccd8    rbp = 0x00007fd05fffcf90
     rsp = 0x00007fd05fffc9d0    r12 = 0x00007fd07cb06f00
     r13 = 0x00007fd05fffcb60    r14 = 0x00007fd05fffcbb0
     r15 = 0x00007fd07cddc7e0    rip = 0x00007fd2dbdf76c3
    Found by: call frame info
 8  libigdfcl.so.1 + 0x46fad
     rbx = 0x00007fd05fffd380    rbp = 0x00007fd05fffd290
     rsp = 0x00007fd05fffcfa0    r12 = 0x00007fd05fffd070
     r13 = 0x00007fd05fffd0b0    r14 = 0x00007fd05fffd010
     r15 = 0x00007fd05fffd0d0    rip = 0x00007fd394405fae
    Found by: call frame info
 9  libigdfcl.so.1 + 0x48805
     rbp = 0x00007fd05fffd5f0    rsp = 0x00007fd05fffd2a0
     rip = 0x00007fd394407806
    Found by: previous frame's frame pointer
10  libigdfcl.so.1 + 0x56dcf
     rbp = 0x00007fd05fffd760    rsp = 0x00007fd05fffd600
     rip = 0x00007fd394415dd0
    Found by: previous frame's frame pointer
11  libigdrcl.so + 0x5cd8c0
     rbp = 0x00007fd05fffd8c0    rsp = 0x00007fd05fffd770
     rip = 0x00007fd2e3ae88c1
    Found by: previous frame's frame pointer
12  libigdrcl.so + 0x10ca67
     rbp = 0x00007fd05fffda30    rsp = 0x00007fd05fffd8d0
     rip = 0x00007fd2e3627a68
    Found by: previous frame's frame pointer
image (6)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (2 by maintainers)

Most upvoted comments

@zabomate First of all, I’d suggest trying another OCL runtime versions as crash happens somewhere in the driver libs. If OCL runtime version switch doesn’t help, please provide some info on your device (clinfo output) and OCL runtime version(s) that you’ve tried in addition to things mentioned by @Iffa-Meah