compute-runtime: `clBuildProgram` segfaults on Intel UHD Graphics 630

Our libraries using OpenCL through intel-compute-runtime segfault on both Ubuntu 18.04 and Arch on an Intel 9900K. Running this on both NVIDIA 1660 and Radeon RX550 did not crash. Compiling just the snippet of OpenCL in a minimal program calling clBuildProgram and running it on Intel did not crash.

I am inlining a minimal DockerFile to reproduce the crash.

FROM ubuntu:bionic

RUN apt-get update
RUN apt-get install -y wget clinfo p7zip-full g++

WORKDIR /root

RUN wget https://www.zivid.com/hubfs/softwarefiles/releases/1.5.0+63f281e2-26/u18/zivid-telicam-driver_2.0.0.1-1_amd64.deb
RUN wget http://www.zivid.com/hubfs/softwarefiles/releases/1.5.0+63f281e2-26/u18/zivid_1.5.0+63f281e2-26_amd64.deb

RUN wget https://github.com/intel/compute-runtime/releases/download/19.36.14103/intel-gmmlib_19.2.3_amd64.deb
RUN wget https://github.com/intel/compute-runtime/releases/download/19.36.14103/intel-igc-core_1.0.11-2500_amd64.deb
RUN wget https://github.com/intel/compute-runtime/releases/download/19.36.14103/intel-igc-opencl_1.0.11-2500_amd64.deb
RUN wget https://github.com/intel/compute-runtime/releases/download/19.36.14103/intel-opencl_19.36.14103_amd64.deb
RUN wget https://github.com/intel/compute-runtime/releases/download/19.36.14103/intel-ocloc_19.36.14103_amd64.deb

RUN dpkg --install *.deb; exit 0
RUN apt-get install -fy

RUN wget https://www.zivid.com/hubfs/softwarefiles/releases/1.5.0+63f281e2-26/Zivid-1.5.0+63f281e2-26-Samples.zip
RUN 7z x Zivid-1.5.0+63f281e2-26-Samples.zip

RUN wget https://zivid.com/software/ZividSampleData.zip
RUN 7z x ZividSampleData.zip

WORKDIR /root/Samples/CPP/SampleCaptureFromFile
RUN g++ SampleCaptureFromFile.cpp -lZividCore -o SampleCaptureFromFile

ENV ZIVID_DATA /root

Steps to reproduce

  • Build the image with docker build -t neocrash .
  • Run it with docker run -it --rm --device /dev/dri:/dev/dri neocrash:latest
  • Run ./SampleCaptureFromFile in the shell

Stack trace from the container

#0 0x0000000000000000 in ?? ()
#1 0x00007fb2e49bd827 in __pthread_once_slow (once_control=0x7fb2dbfc1e60, init_routine=0x7fb2e5a93830 <__once_proxy>) at pthread_once.c:116
#2 0x00007fb2d9c99725 in ?? () from /usr/local/lib/libopencl-clang.so.8
#3 0x00007fb2d9c876ae in ?? () from /usr/local/lib/libopencl-clang.so.8
#4 0x00007fb2d8635aef in ?? () from /usr/local/lib/libopencl-clang.so.8
#5 0x00007fb2e8c0f733 in call_init (env=0x7ffd677a17c8, argv=0x7ffd677a17b8, argc=1, l=<optimized out>) at dl-init.c:72
#6 _dl_init (main_map=main_map@entry=0x55baec6ffe30, argc=1, argv=0x7ffd677a17b8, env=0x7ffd677a17c8) at dl-init.c:119
#7 0x00007fb2e8c141ff in dl_open_worker (a=a@entry=0x7ffd6779ffb0) at dl-open.c:522
#8 0x00007fb2e55352df in __GI__dl_catch_exception (exception=0x7ffd6779ff90, operate=0x7fb2e8c13dc0 <dl_open_worker>, args=0x7ffd6779ffb0) at dl-error-skeleton.c:196
#9 0x00007fb2e8c137ca in _dl_open (file=0x7ffd677a02e0 "libigdfcl.so.1", mode=-2147483639, caller_dlopen=0x7fb2e2b70bc9, nsid=<optimized out>, argc=1, argv=<optimized out>, env=0x7ffd677a17c8) at dl-open.c:605
#10 0x00007fb2e4204f96 in dlopen_doit (a=a@entry=0x7ffd677a01e0) at dlopen.c:66
#11 0x00007fb2e55352df in __GI__dl_catch_exception (exception=exception@entry=0x7ffd677a0180, operate=0x7fb2e4204f40 <dlopen_doit>, args=0x7ffd677a01e0) at dl-error-skeleton.c:196
#12 0x00007fb2e553536f in __GI__dl_catch_error (objname=0x55baec5d94a0, errstring=0x55baec5d94a8, mallocedp=0x55baec5d9498, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:215
#13 0x00007fb2e4205735 in _dlerror_run (operate=operate@entry=0x7fb2e4204f40 <dlopen_doit>, args=args@entry=0x7ffd677a01e0) at dlerror.c:162
#14 0x00007fb2e4205051 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#15 0x00007fb2e2b70bc9 in ?? () from /usr/local/lib/intel-opencl/libigdrcl.so
#16 0x00007fb2e2b70c1d in ?? () from /usr/local/lib/intel-opencl/libigdrcl.so
#17 0x00007fb2e2b196e2 in ?? () from /usr/local/lib/intel-opencl/libigdrcl.so
#18 0x00007fb2e2b2fbed in ?? () from /usr/local/lib/intel-opencl/libigdrcl.so
#19 0x00007fb2e2b75146 in ?? () from /usr/local/lib/intel-opencl/libigdrcl.so
#20 0x00007fb2e2ae46fa in ?? () from /usr/local/lib/intel-opencl/libigdrcl.so
#21 0x00007fb2e6419f84 in clBuildProgram () from /usr/lib/libZividCore.so

Stacktrace from crash reproduced on Arch

  • intel-compute-runtime 19.36.14103-1
  • intel-gmmlib 19.2.4-1
  • intel-graphics-compiler 1:1.0.11-1
  • intel-opencl-clang 8.0.1-2
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff7d3cdbf in __pthread_once_slow () from /usr/lib/libpthread.so.0
#2  0x00007fff790e5719 in llvm::ManagedStaticBase::RegisterManagedStatic(void* (*)(), void (*)(void*)) const () from /usr/lib/libLLVM-8.so
#3  0x00007fff7908d973 in llvm::cl::OptionCategory::registerCategory() () from /usr/lib/libLLVM-8.so
#4  0x00007fff78f84930 in llvm::X86ATTInstPrinter::printCustomAliasOperand(llvm::MCInst const*, unsigned int, unsigned int, llvm::raw_ostream&) () from /usr/lib/libLLVM-8.so
#5  0x00007ffff7fe279a in call_init.part () from /lib64/ld-linux-x86-64.so.2
#6  0x00007ffff7fe28a1 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7fe6683 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff1eec3d9 in _dl_catch_exception () from /usr/lib/libc.so.6
#9  0x00007ffff7fe5f5e in _dl_open () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7d4e34c in ?? () from /usr/lib/libdl.so.2
#11 0x00007ffff1eec3d9 in _dl_catch_exception () from /usr/lib/libc.so.6
#12 0x00007ffff1eec473 in _dl_catch_error () from /usr/lib/libc.so.6
#13 0x00007ffff7d4eab9 in ?? () from /usr/lib/libdl.so.2
#14 0x00007ffff7d4e3da in dlopen () from /usr/lib/libdl.so.2
#15 0x00007fffe481dc37 in ?? () from /usr/lib/intel-opencl/libigdrcl.so
#16 0x00007fffe481dc8c in ?? () from /usr/lib/intel-opencl/libigdrcl.so
#17 0x00007fffe47c252a in ?? () from /usr/lib/intel-opencl/libigdrcl.so
#18 0x00007fffe47c17e5 in ?? () from /usr/lib/intel-opencl/libigdrcl.so
#19 0x00007fffe47d41e2 in ?? () from /usr/lib/intel-opencl/libigdrcl.so
#20 0x00007fffe4822269 in ?? () from /usr/lib/intel-opencl/libigdrcl.so
#21 0x00007fffe4781bd0 in ?? () from /usr/lib/intel-opencl/libigdrcl.so
#22 0x00007ffff494a9b8 in clBuildProgram () from xxx/libZividCore.so

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 20 (10 by maintainers)

Most upvoted comments

Yes we use RTLD_DEEPBIND, because Neo loads IGC using dlopen, and IGC can be linked statically with llvm libraries. Closing issue, as it was fixed on libZividCore side.

It was added to fix issue #122 If you see problems with this flag you can set -DSANITIZER_BUILD in CMAKE_CXX_FLAGS.

I can confirm that the LD_PRELOAD workaround fixes the issue.

With below patch

diff --git a/runtime/built_ins/built_ins.cpp b/runtime/built_ins/built_ins.cpp
index b04ef2db..840a2b0d 100644
--- a/runtime/built_ins/built_ins.cpp
+++ b/runtime/built_ins/built_ins.cpp
@@ -91,7 +91,9 @@ const SipKernel &BuiltIns::getSipKernel(SipKernelType type, Device &device) {
     UNRECOVERABLE_IF(kernelId >= static_cast<uint32_t>(SipKernelType::COUNT));
     auto &sipBuiltIn = this->sipKernels[kernelId];
 
-    auto initializer = [&] {
+     if(sipBuiltIn.first) return *sipBuiltIn.first;
+
+//    auto initializer = [&] {
         cl_int retVal = CL_SUCCESS;
 
         std::vector<char> sipBinary;
@@ -116,9 +118,9 @@ const SipKernel &BuiltIns::getSipKernel(SipKernelType type, Device &device) {
         DEBUG_BREAK_IF(retVal != CL_SUCCESS);
 
         sipBuiltIn.first.reset(new SipKernel(type, program));
-    };
-    std::call_once(sipBuiltIn.second, initializer);
-    UNRECOVERABLE_IF(sipBuiltIn.first == nullptr);
+//    };
+//    std::call_once(sipBuiltIn.second, initializer);
+//    UNRECOVERABLE_IF(sipBuiltIn.first == nullptr);
     return *sipBuiltIn.first;
 }

I was able to reproduce similar segfault in Ubuntu 18.04 docker container

Thread 1 "SampleCaptureFr" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff3b93827 in __pthread_once_slow (once_control=0x7fffe3f97600, init_routine=0x7ffff4c69830 <__once_proxy>) at pthread_once.c:116
#2  0x00007fffe0cc37a6 in llvm::ManagedStaticBase::RegisterManagedStatic(void* (*)(), void (*)(void*)) const () from /usr/lib/x86_64-linux-gnu/libLLVM-8.so.1
#3  0x00007fffe0c7d18d in llvm::cl::OptionCategory::registerCategory() () from /usr/lib/x86_64-linux-gnu/libLLVM-8.so.1
#4  0x00007fffe0b810c2 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-8.so.1
#5  0x00007ffff7de5733 in call_init (env=0x7fffffffe628, argv=0x7fffffffe618, argc=1, l=<optimized out>) at dl-init.c:72
#6  _dl_init (main_map=main_map@entry=0x55555584e290, argc=1, argv=0x7fffffffe618, env=0x7fffffffe628) at dl-init.c:119
#7  0x00007ffff7dea1ff in dl_open_worker (a=a@entry=0x7fffffffd3a0) at dl-open.c:522
#8  0x00007ffff470b2df in __GI__dl_catch_exception (exception=0x7fffffffd380, operate=0x7ffff7de9dc0 <dl_open_worker>, args=0x7fffffffd3a0) at dl-error-skeleton.c:196
#9  0x00007ffff7de97ca in _dl_open (file=0x7fffffffd6d0 "libigdfcl.so.1", mode=-2147483639, caller_dlopen=0x7ffff1be8718 <NEO::Linux::OsLibrary::OsLibrary(std::__cxx11::basic_string<char, s
td::char_traits<char>, std::allocator<char> > const&)+122>, nsid=<optimized out>, argc=1, argv=<optimized out>, env=0x7fffffffe628) at dl-open.c:605
#10 0x00007ffff33daf96 in dlopen_doit (a=a@entry=0x7fffffffd5d0) at dlopen.c:66
#11 0x00007ffff470b2df in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffd570, operate=0x7ffff33daf40 <dlopen_doit>, args=0x7fffffffd5d0) at dl-error-skeleton.c:196
#12 0x00007ffff470b36f in __GI__dl_catch_error (objname=0x5555557884a0, errstring=0x5555557884a8, mallocedp=0x555555788498, operate=<optimized out>, args=<optimized out>) at dl-error-skelet
on.c:215
#13 0x00007ffff33db735 in _dlerror_run (operate=operate@entry=0x7ffff33daf40 <dlopen_doit>, args=args@entry=0x7fffffffd5d0) at dlerror.c:162
#14 0x00007ffff33db051 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#15 0x00007ffff1be8718 in NEO::Linux::OsLibrary::OsLibrary (this=0x555555834390, name="libigdfcl.so.1") at /root/compute-runtime/runtime/os_interface/linux/os_library.cpp:39
#16 0x00007ffff1be8608 in NEO::OsLibrary::load (name="libigdfcl.so.1") at /root/compute-runtime/runtime/os_interface/linux/os_library.cpp:18
#17 0x00007ffff1b2c386 in NEO::loadCompiler<IGC::FclOclDeviceCtx> (libName=0x7ffff1dcb754 "libigdfcl.so.1", outLib=std::unique_ptr<NEO::OsLibrary> = {...}, outLibMain=std::unique_ptr<CIF::C
IFMain> = {...}) at /root/compute-runtime/runtime/compiler_interface/compiler_interface.inl:93
#18 0x00007ffff1b2478e in NEO::CompilerInterface::initialize (this=0x55555584d1d0) at /root/compute-runtime/runtime/compiler_interface/compiler_interface.cpp:364
#19 0x00007ffff1b5fe23 in NEO::CompilerInterface::createInstance () at /root/compute-runtime/runtime/compiler_interface/compiler_interface.h:47
#20 0x00007ffff1b5fa43 in NEO::ExecutionEnvironment::getCompilerInterface (this=0x55555582cbb0) at /root/compute-runtime/runtime/execution_environment/execution_environment.cpp:109
#21 0x00007ffff1ae0d50 in NEO::BuiltIns::getSipKernel (this=0x55555584cec0, type=NEO::SipKernelType::Csr, device=...) at /root/compute-runtime/runtime/built_ins/built_ins.cpp:100
#22 0x00007ffff1acde85 in NEO::initSipKernel (type=NEO::SipKernelType::Csr, device=...) at /root/compute-runtime/runtime/helpers/built_ins_helper.cpp:15
#23 0x00007ffff1bebca7 in NEO::Platform::initialize (this=0x55555578afb0) at /root/compute-runtime/runtime/platform/platform.cpp:181
#24 0x00007ffff1a2d22c in clGetPlatformIDs (numEntries=1, platforms=0x55555582c560, numPlatforms=0x0) at /root/compute-runtime/runtime/api/api.cpp:80
#25 0x00007ffff1a2d44e in clIcdGetPlatformIDsKHR (numEntries=1, platforms=0x55555582c560, numPlatforms=0x0) at /root/compute-runtime/runtime/api/api.cpp:110
#26 0x00007ffff55f26cd in khrIcdVendorAdd () from /usr/lib/libZividCore.so
#27 0x00007ffff55f2c6f in khrIcdOsVendorsEnumerate () from /usr/lib/libZividCore.so
#28 0x00007ffff3b93827 in __pthread_once_slow (once_control=0x7ffff7db5a30 <initialized>, init_routine=0x7ffff55f29c7 <khrIcdOsVendorsEnumerate>) at pthread_once.c:116
#29 0x00007ffff55f2cf1 in khrIcdOsVendorsEnumerateOnce () from /usr/lib/libZividCore.so
#30 0x00007ffff55f2567 in khrIcdInitialize () from /usr/lib/libZividCore.so
#31 0x00007ffff55ef4f7 in clGetPlatformIDs () from /usr/lib/libZividCore.so
#32 0x00007ffff55ec037 in cl::Platform::get(std::vector<cl::Platform, std::allocator<cl::Platform> >*) () from /usr/lib/libZividCore.so
#33 0x00007ffff55e7980 in OCL::Context::Context(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1::operator()() const () from /usr/lib/libZividCor
e.so
#34 0x00007ffff55e77bf in OCL::Context::Context(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/libZividCore.so
#35 0x00007ffff5537042 in Zivid::ComputeDeviceImplOCL::ComputeDeviceImplOCL(Zivid::Configuration::ComputeDevice const&) () from /usr/lib/libZividCore.so
#36 0x00007ffff5539913 in Zivid::makeComputeDevice(std::optional<Zivid::Configuration::ComputeDevice> const&) () from /usr/lib/libZividCore.so
#37 0x00007ffff506a0b8 in Zivid::ApplicationImpl::ApplicationImpl(Zivid::Configuration const&) () from /usr/lib/libZividCore.so
#38 0x00007ffff506ae0d in Zivid::ApplicationImpl::ApplicationImpl() () from /usr/lib/libZividCore.so
#39 0x00007ffff506890c in Zivid::Application::Application(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/libZividCore.so
#40 0x0000555555555b5e in Zivid::Application::Application() ()
#41 0x00005555555553c6 in main ()

When I set LD_PRELOAD to load libstdc++.so.6, application executed correctly:

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6 ./SampleCaptureFromFile 
Initializing camera emulation using file: /root/MiscObjects.zdf
Capture a frame
Saving frame to file: result.zdf

When I removed mentioned patch, and used original intel-opencl package (19.36.14103) with LD_PRELOAD set as earlier, application also executed correctly.