pocl: [Regression] Valgrind "invalid read" on relatively simple kernel
Running this script
- on pocl 68273059 compiled against Debian’s LLVM/Clang 1:9~+rc2-1~exp2
- on pocl 6f31aa89 compiled against Debian’s LLVM/Clang 1:8.0.1-3+b1
on a Haswell machine using
PYOPENCL_TEST=port valgrind --suppressions=python-simple.supp python pocl-trigger-bug.py
produces this output, which includes these warnings:
==6481==
==6481== Invalid read of size 32
==6481== at 0x6C91127A: _pocl_kernel_grudge_assign_0_workgroup (in /home/andreask_work/.cache/pocl/kcache/IB/DGOCMKFDFAGEGJKLHGHENBDDMIOFGBAMKFILN/grudge_assign_0/128-1-1-goffs0-smallgrid/grudge_assign_0.so)
==6481== by 0x4A37FA2: start_thread (pthread_create.c:486)
==6481== by 0x49684CE: clone (clone.S:95)
==6481== Address 0x45bcd6f20 is 122,528 bytes inside a block of size 122,624 in arena "client"
==6481==
==6481== Invalid read of size 32
==6481== at 0x6C911280: _pocl_kernel_grudge_assign_0_workgroup (in /home/andreask_work/.cache/pocl/kcache/IB/DGOCMKFDFAGEGJKLHGHENBDDMIOFGBAMKFILN/grudge_assign_0/128-1-1-goffs0-smallgrid/grudge_assign_0.so)
==6481== by 0x4A37FA2: start_thread (pthread_create.c:486)
==6481== by 0x49684CE: clone (clone.S:95)
==6481== Address 0x45bcd6f40 is 122,560 bytes inside a block of size 122,624 in arena "client"
==6481==
==6481== Invalid read of size 32
==6481== at 0x6C911286: _pocl_kernel_grudge_assign_0_workgroup (in /home/andreask_work/.cache/pocl/kcache/IB/DGOCMKFDFAGEGJKLHGHENBDDMIOFGBAMKFILN/grudge_assign_0/128-1-1-goffs0-smallgrid/grudge_assign_0.so)
==6481== by 0x4A37FA2: start_thread (pthread_create.c:486)
==6481== by 0x49684CE: clone (clone.S:95)
==6481== Address 0x45bcd6f60 is 122,592 bytes inside a block of size 122,624 in arena "client"
==6481==
where the suppression file python-simple.supp
can be found in this gist.
This was observed in the wild in a PDE solver where it caused occasional crashes. The issue code included here is a simplified version of that kernel.
Valgrind also highlights a suspicious uninitialized access during compilation which may be related:
==6481== Conditional jump or move depends on uninitialised value(s)
==6481== at 0x731D9CF7: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481== by 0x7312A367: llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481== by 0x72FA48A5: llvm::FPPassManager::runOnFunction(llvm::Function&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481== by 0x72FA4B52: llvm::FPPassManager::runOnModule(llvm::Module&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481== by 0x72FA4FFF: llvm::legacy::PassManagerImpl::run(llvm::Module&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481== by 0x70DE9CB6: pocl_llvm_codegen (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x70D89658: llvm_codegen (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x70D8B34C: pocl_check_kernel_disk_cache (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x70D8B926: pocl_check_kernel_dlhandle_cache (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x70D5E135: program_compile_dynamic_wg_binaries (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x70D71721: POclGetProgramInfo (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x6DAD93C6: pyopencl::program::get_info(unsigned int) const (wrap_cl.hpp:3838)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 40 (40 by maintainers)
Commits related to this issue
- Add patch for https://github.com/pocl/pocl/issues/757 — committed to inducer/pocl-feedstock by inducer 4 years ago
- WI loops: ensure loads in parallel loops are safe to execute unconditionally This change checks that loads in parallel loops are safe to execute unconditionally before adding parallel loop metadata t... — committed to mattwala/pocl by mattwala 4 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to llvm/llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to tstellar/llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to tstellar/llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to arichardson/llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to s194604/llvm-playground by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to Huawei-CPLLab/classic-flang-llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to flang-compiler/classic-flang-llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to ajohnson-uoregon/clang-rewrite-only by fodinabor 3 years ago
#850 should address this.
https://bugs.llvm.org/show_bug.cgi?id=46666
We could have a build-time option to disable the workaround as long as we’re running against LLVM with https://github.com/llvm/llvm-project/commit/411d31ad72456ba88c0b0bee0faba2b774add65f reverted.
The problem is that POCL is applying
llvm.loop.parallel_accesses
metadata too aggressively. In order to handle the conditional in the loop, the loop vectorizer ought to generate masked loads and stores. However, addingllvm.loop.parallel_accesses
metadata like POCL does suppresses the generation of the masked loads, so it generates an unmasked load which leads to an out of bounds read.More specifically, for a parallel loop, conditionally executed loads are assumed safe to execute speculatively (and hence with unmasked loads). This is confirmed by this comment in
LoopVectorizationLegality.cpp
, and this was mentioned in the language reference before it was removed. The removal of that information from the language reference seems like a bug.The solution on POCL’s end is not to add
llvm.access.group
metadata for memory reads. (Thellvm.access.group
metadata is required for the loop to be regarded as parallel.) A crude version of this can be done by the following patch:You could obviously be a bit more selective if you can prove that speculative execution of loads is safe.
I took Isuru’s C++ reproducer and modified it slightly to get rid of a few extraneous uninitialized reads. See here. If I run this with valgrind and POCL 1.5, I get the “Invalid read of size 32” error. If I run with the above patch applied (and make sure to delete the cache), then I don’t get the invalid reads. Examining
parallel.bc
before and after the patch confirms that the masked loads are being added only after.This might be how: https://github.com/pocl/pocl/blob/master/lib/CL/pocl_llvm_wg.cc#L279