pocl: [Regression] Valgrind "invalid read" on relatively simple kernel

on pocl 68273059 compiled against Debian’s LLVM/Clang 1:9~+rc2-1~exp2
on pocl 6f31aa89 compiled against Debian’s LLVM/Clang 1:8.0.1-3+b1

on a Haswell machine using

PYOPENCL_TEST=port valgrind --suppressions=python-simple.supp python pocl-trigger-bug.py

produces this output, which includes these warnings:

==6481== 
==6481== Invalid read of size 32
==6481==    at 0x6C91127A: _pocl_kernel_grudge_assign_0_workgroup (in /home/andreask_work/.cache/pocl/kcache/IB/DGOCMKFDFAGEGJKLHGHENBDDMIOFGBAMKFILN/grudge_assign_0/128-1-1-goffs0-smallgrid/grudge_assign_0.so)
==6481==    by 0x4A37FA2: start_thread (pthread_create.c:486)
==6481==    by 0x49684CE: clone (clone.S:95)
==6481==  Address 0x45bcd6f20 is 122,528 bytes inside a block of size 122,624 in arena "client"
==6481== 
==6481== Invalid read of size 32
==6481==    at 0x6C911280: _pocl_kernel_grudge_assign_0_workgroup (in /home/andreask_work/.cache/pocl/kcache/IB/DGOCMKFDFAGEGJKLHGHENBDDMIOFGBAMKFILN/grudge_assign_0/128-1-1-goffs0-smallgrid/grudge_assign_0.so)
==6481==    by 0x4A37FA2: start_thread (pthread_create.c:486)
==6481==    by 0x49684CE: clone (clone.S:95)
==6481==  Address 0x45bcd6f40 is 122,560 bytes inside a block of size 122,624 in arena "client"
==6481== 
==6481== Invalid read of size 32
==6481==    at 0x6C911286: _pocl_kernel_grudge_assign_0_workgroup (in /home/andreask_work/.cache/pocl/kcache/IB/DGOCMKFDFAGEGJKLHGHENBDDMIOFGBAMKFILN/grudge_assign_0/128-1-1-goffs0-smallgrid/grudge_assign_0.so)
==6481==    by 0x4A37FA2: start_thread (pthread_create.c:486)
==6481==    by 0x49684CE: clone (clone.S:95)
==6481==  Address 0x45bcd6f60 is 122,592 bytes inside a block of size 122,624 in arena "client"
==6481==

where the suppression file python-simple.supp can be found in this gist.

This was observed in the wild in a PDE solver where it caused occasional crashes. The issue code included here is a simplified version of that kernel.

Valgrind also highlights a suspicious uninitialized access during compilation which may be related:

==6481== Conditional jump or move depends on uninitialised value(s)
==6481==    at 0x731D9CF7: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481==    by 0x7312A367: llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481==    by 0x72FA48A5: llvm::FPPassManager::runOnFunction(llvm::Function&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481==    by 0x72FA4B52: llvm::FPPassManager::runOnModule(llvm::Module&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481==    by 0x72FA4FFF: llvm::legacy::PassManagerImpl::run(llvm::Module&) (in /usr/lib/x86_64-linux-gnu/libLLVM-9.so.1)
==6481==    by 0x70DE9CB6: pocl_llvm_codegen (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481==    by 0x70D89658: llvm_codegen (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481==    by 0x70D8B34C: pocl_check_kernel_disk_cache (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481==    by 0x70D8B926: pocl_check_kernel_dlhandle_cache (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481==    by 0x70D5E135: program_compile_dynamic_wg_binaries (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481==    by 0x70D71721: POclGetProgramInfo (in /opt/pocl-master/lib/libpocl.so.2.4.0)
==6481==    by 0x6DAD93C6: pyopencl::program::get_info(unsigned int) const (wrap_cl.hpp:3838)

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 40 (40 by maintainers)

Commits related to this issue

Add patch for https://github.com/pocl/pocl/issues/757 — committed to inducer/pocl-feedstock by inducer 4 years ago
WI loops: ensure loads in parallel loops are safe to execute unconditionally This change checks that loads in parallel loops are safe to execute unconditionally before adding parallel loop metadata t... — committed to mattwala/pocl by mattwala 4 years ago
[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to llvm/llvm-project by fodinabor 3 years ago
[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to tstellar/llvm-project by fodinabor 3 years ago
[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to tstellar/llvm-project by fodinabor 3 years ago
[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to arichardson/llvm-project by fodinabor 3 years ago
[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to s194604/llvm-playground by fodinabor 3 years ago
[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to Huawei-CPLLab/classic-flang-llvm-project by fodinabor 3 years ago
[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to flang-compiler/classic-flang-llvm-project by fodinabor 3 years ago
[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annot... — committed to ajohnson-uoregon/clang-rewrite-only by fodinabor 3 years ago

Most upvoted comments

#850 should address this.

Can you at least file an LLVM bug out of this and link that to this issue so we can keep track easier?

https://bugs.llvm.org/show_bug.cgi?id=46666

mattwala on Jul 10, 2020

We could have a build-time option to disable the workaround as long as we’re running against LLVM with https://github.com/llvm/llvm-project/commit/411d31ad72456ba88c0b0bee0faba2b774add65f reverted.

inducer on Oct 6, 2020

The problem is that POCL is applying llvm.loop.parallel_accesses metadata too aggressively. In order to handle the conditional in the loop, the loop vectorizer ought to generate masked loads and stores. However, adding llvm.loop.parallel_accesses metadata like POCL does suppresses the generation of the masked loads, so it generates an unmasked load which leads to an out of bounds read.

More specifically, for a parallel loop, conditionally executed loads are assumed safe to execute speculatively (and hence with unmasked loads). This is confirmed by this comment in LoopVectorizationLegality.cpp, and this was mentioned in the language reference before it was removed. The removal of that information from the language reference seems like a bug.

The solution on POCL’s end is not to add llvm.access.group metadata for memory reads. (The llvm.access.group metadata is required for the loop to be regarded as parallel.) A crude version of this can be done by the following patch:

diff --git a/lib/llvmopencl/ParallelRegion.cc b/lib/llvmopencl/ParallelRegion.cc
index 187e8b52..4a635377 100644
--- a/lib/llvmopencl/ParallelRegion.cc
+++ b/lib/llvmopencl/ParallelRegion.cc
@@ -474,7 +474,14 @@ void ParallelRegion::AddParallelLoopMetadata(llvm::MDNode *Identifier) {
     for (BasicBlock::iterator ii = bb->begin(), ee = bb->end();
          ii != ee; ii++) {
 
-      if (ii->mayReadOrWriteMemory()) {
+      if (ii->mayReadFromMemory()) {
+        // Parallel loop metadata on memory reads also implies that
+        // if-conversion (i.e., speculative execution within a loop
+        // iteration) is safe. This isn't guaranteed.
+        continue;
+      }
+
+      if (ii->mayWriteToMemory()) {
         MDNode *NewMD = MDNode::get(bb->getContext(), Identifier);
         MDNode *OldMD = ii->getMetadata(PARALLEL_MD_NAME);
         if (OldMD != nullptr) {

You could obviously be a bit more selective if you can prove that speculative execution of loads is safe.

I took Isuru’s C++ reproducer and modified it slightly to get rid of a few extraneous uninitialized reads. See here. If I run this with valgrind and POCL 1.5, I get the “Invalid read of size 32” error. If I run with the above patch applied (and make sure to delete the cache), then I don’t get the invalid reads. Examining parallel.bc before and after the patch confirms that the masked loads are being added only after.

mattwala on Jul 8, 2020

This might be how: https://github.com/pocl/pocl/blob/master/lib/CL/pocl_llvm_wg.cc#L279

inducer on Sep 18, 2019