pocl: [Regression] Valgrind "invalid read" on relatively simple kernel
Running this script
- on pocl 68273059 compiled against Debian’s LLVM/Clang 1:9~+rc2-1~exp2
- on pocl 6f31aa89 compiled against Debian’s LLVM/Clang 1:8.0.1-3+b1
on a Haswell machine using
PYOPENCL_TEST=port valgrind --suppressions=python-simple.supp python pocl-trigger-bug.py
produces this output, which includes these warnings:
==6481==
==6481== Invalid read of size 32
==6481== at 0x6C91127A: _pocl_kernel_grudge_assign_0_workgroup (in /home/andreask_work/.cache/pocl/kcache/IB/DGOCMKFDFAGEGJKLHGHENBDDMIOFGBAMKFILN/grudge_assign_0/128-1-1-goffs0-smallgrid/grudge_assign_0.so)
==6481== by 0x4A37FA2: start_thread (pthread_create.c:486)
==6481== by 0x49684CE: clone (clone.S:95)
==6481== Address 0x45bcd6f20 is 122,528 bytes inside a block of size 122,624 in arena "client"
==6481==
==6481== Invalid read of size 32
==6481== at 0x6C911280: _pocl_kernel_grudge_assign_0_workgroup (in /home/andreask_work/.cache/pocl/kcache/IB/DGOCMKFDFAGEGJKLHGHENBDDMIOFGBAMKFILN/grudge_assign_0/128-1-1-goffs0-smallgrid/grudge_assign_0.so)
==6481== by 0x4A37FA2: start_thread (pthread_create.c:486)
==6481== by 0x49684CE: clone (clone.S:95)
==6481== Address 0x45bcd6f40 is 122,560 bytes inside a block of size 122,624 in arena "client"
==6481==
==6481== Invalid read of size 32
==6481== at 0x6C911286: _pocl_kernel_grudge_assign_0_workgroup (in /home/andreask_work/.cache/pocl/kcache/IB/DGOCMKFDFAGEGJKLHGHENBDDMIOFGBAMKFILN/grudge_assign_0/128-1-1-goffs0-smallgrid/grudge_assign_0.so)
==6481== by 0x4A37FA2: start_thread (pthread_create.c:486)
==6481== by 0x49684CE: clone (clone.S:95)
==6481== Address 0x45bcd6f60 is 122,592 bytes inside a block of size 122,624 in arena "client"
==6481==
where the suppression file python-simple.supp can be found in this gist.
This was observed in the wild in a PDE solver where it caused occasional crashes. The issue code included here is a simplified version of that kernel.
Valgrind also highlights a suspicious uninitialized access during compilation which may be related:
==6481== Conditional jump or move depends on uninitialised value(s)
==6481== at 0x731D9CF7: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481== by 0x7312A367: llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481== by 0x72FA48A5: llvm::FPPassManager::runOnFunction(llvm::Function&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481== by 0x72FA4B52: llvm::FPPassManager::runOnModule(llvm::Module&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481== by 0x72FA4FFF: llvm::legacy::PassManagerImpl::run(llvm::Module&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481== by 0x70DE9CB6: pocl_llvm_codegen (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x70D89658: llvm_codegen (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x70D8B34C: pocl_check_kernel_disk_cache (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x70D8B926: pocl_check_kernel_dlhandle_cache (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x70D5E135: program_compile_dynamic_wg_binaries (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x70D71721: POclGetProgramInfo (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481== by 0x6DAD93C6: pyopencl::program::get_info(unsigned int) const (wrap_cl.hpp:3838)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 40 (40 by maintainers)
Commits related to this issue
- Add patch for https://github.com/pocl/pocl/issues/757 — committed to inducer/pocl-feedstock by inducer 4 years ago
- WI loops: ensure loads in parallel loops are safe to execute unconditionally This change checks that loads in parallel loops are safe to execute unconditionally before adding parallel loop metadata t... — committed to mattwala/pocl by mattwala 4 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to llvm/llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to tstellar/llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to tstellar/llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to arichardson/llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to s194604/llvm-playground by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to Huawei-CPLLab/classic-flang-llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to flang-compiler/classic-flang-llvm-project by fodinabor 3 years ago
- [LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to ajohnson-uoregon/clang-rewrite-only by fodinabor 3 years ago
#850 should address this.
https://bugs.llvm.org/show_bug.cgi?id=46666
We could have a build-time option to disable the workaround as long as we’re running against LLVM with https://github.com/llvm/llvm-project/commit/411d31ad72456ba88c0b0bee0faba2b774add65f reverted.
The problem is that POCL is applying
llvm.loop.parallel_accessesmetadata too aggressively. In order to handle the conditional in the loop, the loop vectorizer ought to generate masked loads and stores. However, addingllvm.loop.parallel_accessesmetadata like POCL does suppresses the generation of the masked loads, so it generates an unmasked load which leads to an out of bounds read.More specifically, for a parallel loop, conditionally executed loads are assumed safe to execute speculatively (and hence with unmasked loads). This is confirmed by this comment in
LoopVectorizationLegality.cpp, and this was mentioned in the language reference before it was removed. The removal of that information from the language reference seems like a bug.The solution on POCL’s end is not to add
llvm.access.groupmetadata for memory reads. (Thellvm.access.groupmetadata is required for the loop to be regarded as parallel.) A crude version of this can be done by the following patch:You could obviously be a bit more selective if you can prove that speculative execution of loads is safe.
I took Isuru’s C++ reproducer and modified it slightly to get rid of a few extraneous uninitialized reads. See here. If I run this with valgrind and POCL 1.5, I get the “Invalid read of size 32” error. If I run with the above patch applied (and make sure to delete the cache), then I don’t get the invalid reads. Examining
parallel.bcbefore and after the patch confirms that the masked loads are being added only after.This might be how: https://github.com/pocl/pocl/blob/master/lib/CL/pocl_llvm_wg.cc#L279