triton: `realloc(): invalid pointer` when parsing when Triton is built with clang

This is an old issue that people ran into a few months ago on Slack. I’m now hitting it and wanted a place to record the debugging I’m doing.

Steps to reproduce:

$ clang --version
clang --version
Debian clang version 14.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ ld.lld --version
Debian LLD 14.0.6 (compatible with GNU linkers)

$ rm -rf python/build
$ TRITON_BUILD_WITH_CLANG_LLD=true pip install -e python --no-build-isolation
$ cat >>EOF > /tmp/test.mlir
module {
  tt.func public @load_reduce_kernel_0d1d2de3c4c(%arg0: !tt.ptr<f16, 1> {tt.divisibility = 16 : i32}, %arg1: !tt.ptr<f16, 1> {tt.divisibility = 16 : i32}, %arg2: i32 {tt.divisibility = 16 : i32, tt.max_divisibility = 8 : i32}) attributes {noinline = false} {
    %0 = arith.extsi %arg2 : i32 to i64
    %1 = tt.make_range {end = 128 : i32, start = 0 : i32} : tensor<128xi32>
    %2 = arith.extsi %1 : tensor<128xi32> to tensor<128xi64>
    %3 = tt.expand_dims %2 {axis = 1 : i32} : (tensor<128xi64>) -> tensor<128x1xi64>
    %4 = tt.splat %0 : (i64) -> tensor<128x1xi64>
    %5 = arith.muli %3, %4 : tensor<128x1xi64>
    %6 = tt.splat %arg0 : (!tt.ptr<f16, 1>) -> tensor<128x1x!tt.ptr<f16, 1>>
    %7 = tt.addptr %6, %5 : tensor<128x1x!tt.ptr<f16, 1>>, tensor<128x1xi64>
    %8 = tt.broadcast %7 : (tensor<128x1x!tt.ptr<f16, 1>>) -> tensor<128x64x!tt.ptr<f16, 1>>
    %9 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32>
    %10 = arith.extsi %9 : tensor<64xi32> to tensor<64xi64>
    %11 = tt.expand_dims %10 {axis = 0 : i32} : (tensor<64xi64>) -> tensor<1x64xi64>
    %12 = tt.broadcast %11 : (tensor<1x64xi64>) -> tensor<128x64xi64>
    %13 = tt.addptr %8, %12 : tensor<128x64x!tt.ptr<f16, 1>>, tensor<128x64xi64>
    %14 = tt.load %13 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<128x64xf16>
    %15 = arith.extf %14 : tensor<128x64xf16> to tensor<128x64xf32>
    %16 = "tt.reduce"(%15) <{axis = 1 : i32}> ({
    ^bb0(%arg3: f32, %arg4: f32):
      %20 = arith.maxf %arg3, %arg4 : f32
      tt.reduce.return %20 : f32
    }) : (tensor<128x64xf32>) -> tensor<128xf32>
    %17 = tt.splat %arg1 : (!tt.ptr<f16, 1>) -> tensor<128x!tt.ptr<f16, 1>>
    %18 = tt.addptr %17, %1 : tensor<128x!tt.ptr<f16, 1>>, tensor<128xi32>
    %19 = arith.truncf %16 : tensor<128xf32> to tensor<128xf16>
    tt.store %18, %19 {cache = 1 : i32, evict = 1 : i32} : tensor<128xf16>
    tt.return
  }
}
EOF

$ python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt /tmp/test.mlir -split-input-file -canonicalize -triton-combine

This outputs the following (flakily – sometimes it exits successfully). I think I’m getting a stacktrace now because I installed llvm-symbolize. Previously I just got realloc(): invalid pointer.

realloc(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt /usr/local/google/home/jlebar/code/triton/test/Triton/reduce.mlir -split-input-file -canonicalize -triton-combine
1.      MLIR Parser: custom op parser 'builtin.module'
2.      MLIR Parser: custom op parser 'tt.func'
3.      MLIR Parser: custom op parser 'tt.reduce.return'
 #0 0x0000564e688b76bb llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x55056bb)
 #1 0x0000564e688b5514 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007f6c8383a540 (/lib/x86_64-linux-gnu/libc.so.6+0x3c540)
 #3 0x00007f6c8388812c (/lib/x86_64-linux-gnu/libc.so.6+0x8a12c)
 #4 0x00007f6c8383a4a2 raise (/lib/x86_64-linux-gnu/libc.so.6+0x3c4a2)
 #5 0x00007f6c838244b2 abort (/lib/x86_64-linux-gnu/libc.so.6+0x264b2)
 #6 0x00007f6c838251ed (/lib/x86_64-linux-gnu/libc.so.6+0x271ed)
 #7 0x00007f6c83891aa5 (/lib/x86_64-linux-gnu/libc.so.6+0x93aa5)
 #8 0x00007f6c8389674c __libc_realloc (/lib/x86_64-linux-gnu/libc.so.6+0x9874c)
 #9 0x0000564e6886b7bf llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x54b97bf)
#10 0x0000564e650a0263 llvm::SmallVectorTemplateBase<mlir::Type, true>::push_back(mlir::Type) AffineOps.cpp:0:0
#11 0x0000564e650a02cf mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) AffineOps.cpp:0:0
#12 0x0000564e686465ea mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x52945ea)
#13 0x0000564e65eafb34 mlir::LogicalResult::failed() const /usr/local/google/home/jlebar/.triton/llvm/llvm+mlir-17.0.0-x86_64-linux-gnu-ubuntu-18.04-release/include/mlir/Support/LogicalResult.h:44:33
#14 0x0000564e65eafb34 mlir::ParseResult::operator bool() const /usr/local/google/home/jlebar/.triton/llvm/llvm+mlir-17.0.0-x86_64-linux-gnu-ubuntu-18.04-release/include/mlir/Support/LogicalResult.h:126:43
#15 0x0000564e65eafb34 mlir::triton::ReduceReturnOp::parse(mlir::OpAsmParser&, mlir::OperationState&) /usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/include/triton/Dialect/Triton/IR/Ops.cpp.inc:9362:7
#16 0x0000564e686543e1 (anonymous namespace)::OperationParser::parseOperation() Parser.cpp:0:0
#17 0x0000564e68657b85 (anonymous namespace)::OperationParser::parseBlock(mlir::Block*&) (.part.0) Parser.cpp:0:0
#18 0x0000564e686581a5 (anonymous namespace)::OperationParser::parseRegionBody(mlir::Region&, llvm::SMLoc, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#19 0x0000564e68658611 (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#20 0x0000564e68653325 (anonymous namespace)::OperationParser::parseGenericOperationAfterOpName(mlir::OperationState&, std::optional<llvm::ArrayRef<mlir::OpAsmParser::UnresolvedOperand>>, std::optional<llvm::ArrayRef<mlir::Block*>>, std::optional<llvm::MutableArrayRef<std::unique_ptr<mlir::Region, std::default_delete<mlir::Region>>>>, std::optional<llvm::ArrayRef<mlir::NamedAttribute>>, std::optional<mlir::Attribute>, std::optional<mlir::FunctionType>) (.isra.0) Parser.cpp:0:0
#21 0x0000564e68653aa6 (anonymous namespace)::OperationParser::parseGenericOperation() Parser.cpp:0:0
#22 0x0000564e686546b0 (anonymous namespace)::OperationParser::parseOperation() Parser.cpp:0:0
#23 0x0000564e68657f2d (anonymous namespace)::OperationParser::parseRegionBody(mlir::Region&, llvm::SMLoc, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#24 0x0000564e68658611 (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#25 0x0000564e686587c5 (anonymous namespace)::CustomOpAsmParser::parseOptionalRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#26 0x0000564e687c4322 mlir::function_interface_impl::parseFunctionOp(mlir::OpAsmParser&, mlir::OperationState&, bool, mlir::StringAttr, llvm::function_ref<mlir::Type (mlir::Builder&, llvm::ArrayRef<mlir::Type>, llvm::ArrayRef<mlir::Type>, mlir::function_interface_impl::VariadicFlag, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&)>, mlir::StringAttr, mlir::StringAttr) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x5412322)
#27 0x0000564e65eba190 mlir::triton::FuncOp::parse(mlir::OpAsmParser&, mlir::OperationState&) /usr/local/google/home/jlebar/code/triton/lib/Dialect/Triton/IR/Ops.cpp:824:10
#28 0x0000564e686543e1 (anonymous namespace)::OperationParser::parseOperation() Parser.cpp:0:0
#29 0x0000564e68657f2d (anonymous namespace)::OperationParser::parseRegionBody(mlir::Region&, llvm::SMLoc, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#30 0x0000564e68658611 (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#31 0x0000564e686586a1 (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#32 0x0000564e68781bda mlir::ModuleOp::parse(mlir::OpAsmParser&, mlir::OperationState&) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x53cfbda)
#33 0x0000564e686543e1 (anonymous namespace)::OperationParser::parseOperation() Parser.cpp:0:0
#34 0x0000564e68655c4a mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block*, mlir::ParserConfig const&, mlir::AsmParserState*, mlir::AsmParserCodeCompleteContext*) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x52a3c4a)
#35 0x0000564e68614934 mlir::parseSourceFile(std::shared_ptr<llvm::SourceMgr> const&, mlir::Block*, mlir::ParserConfig const&, mlir::LocationAttr*) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x5262934)
#36 0x0000564e665654f0 performActions(llvm::raw_ostream&, std::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) MlirOptMain.cpp:0:0
#37 0x0000564e665663b5 processBuffer(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::MlirOptMainConfig const&, mlir::DialectRegistry&, llvm::ThreadPool*) MlirOptMain.cpp:0:0
#38 0x0000564e665664a0 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::'lambda'(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>(long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) MlirOptMain.cpp:0:0
#39 0x0000564e6881624e mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool)::'lambda'(llvm::StringRef)::operator()(llvm::StringRef) const ToolUtilities.cpp:0:0
#40 0x0000564e68816866 mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x5464866)
#41 0x0000564e665643a3 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x31b23a3)
#42 0x0000564e665667d3 mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x31b47d3)
#43 0x0000564e64fbe0eb main /usr/local/google/home/jlebar/code/triton/bin/triton-opt.cpp:9:33
#44 0x00007f6c838256ca (/lib/x86_64-linux-gnu/libc.so.6+0x276ca)
#45 0x00007f6c83825785 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27785)
#46 0x0000564e64fbdfe1 _start (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x1c0bfe1)

I got the test MLIR by running one of the Triton Python tests with a patch to output the temp file that’s created with Triton MLIR.

$ git diff
diff --git a/python/src/triton.cc b/python/src/triton.cc
index 0068a23f8..49c6639a5 100644
--- a/python/src/triton.cc
+++ b/python/src/triton.cc
@@ -552,6 +552,7 @@ void init_triton_ir(py::module &&m) {
       [](const std::string &inputFilename, mlir::MLIRContext &context) {
         // initialize registry
         // note: we initialize llvm for undef
+        std::cerr << "parse_mlir_module: " << inputFilename << std::endl;
         mlir::DialectRegistry registry;
         registry.insert<
             mlir::triton::TritonDialect, mlir::triton::gpu::TritonGPUDialect,
@@ -564,6 +565,7 @@ void init_triton_ir(py::module &&m) {
         context.loadAllAvailableDialects();

         // parse module
+        std::cerr << "Calling mlir parseSourceFile." << std::endl;
         mlir::OwningOpRef<mlir::ModuleOp> module =
             mlir::parseSourceFile<mlir::ModuleOp>(inputFilename, &context);
         if (!module)
$ python -m pytest python/test/unit/hopper/test_mixed_io.py -k load_reduce --verbose -s
# prints a temp file it wrote the Triton IR to.

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Comments: 25 (25 by maintainers)

Most upvoted comments

There is a known incompatibility between gcc and clang actually related to lambda…

There is a known incompatibility between gcc and clang actually related to lambda…

Oh, well that’s almost definitely it then. Looks basically identical to https://discourse.llvm.org/t/gcc-abi-compatibility-lambda-captures/70850 ?