tvm: [Bug] PyTorch and TVM loading problem due to conflicting LLVM symbols

Apparently, the new PyTorch release crashes with symbols loaded by TVM, so the following trivial code crashes with invalid pointer Aborted (core dumped) upon exit:

import tvm
import torch

We can workaround this by swapping the import order, but as pointed out in https://github.com/apache/tvm/issues/9349#issuecomment-950685224 this may not always be possible.

Another solution is to remove the use of RTLD_GLOBAL in https://github.com/apache/tvm/blob/dfe4cebbdadab3d4e6e6ba3951276a51a4ffeaf6/python/tvm/_ffi/base.py#L57

See related issues in other repos that moved away from using RTLD_GLOBAL. https://github.com/dmlc/dgl/issues/2255 https://github.com/pytorch/pytorch/pull/28536 https://github.com/pytorch/pytorch/issues/3059

Is there any particular reason we are using RTLD_GLOBAL? @tqchen @areusch

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 20 (17 by maintainers)

Commits related to this issue

Most upvoted comments

OK, digged a bit into this. I think I know the possible cause. This is because of the conflict of LLVM symbols(due to different versions of LLVM being used). PyTorch also starts to ship with LLVM. To avoid the problem, we need to do two things

  • Turn on static linking of LLVM, this will directly link llvm code into libtvm without relying on dynamic library (that creates global symbols)
    • set(USE_LLVM "/path/to/llvm-config --link-static")
  • Turn on set(HIDE_PRIVATE_SYMBOLS ON). This will effectively hide the LLVM related symbols when we load globally from pytorch.

I did a quick experiment locally and when we turn both options ON, things are good, and there will be conflict with either option off.

I can confirm that HIDE_PRIVATE_SYMBOLS=ON also fixes it. I think this is a good enough workaround for now cc @lhutton1 .

Would be good to find out what is the symbol that get conflicted((perhaps by linking things together)) and resolve it(rename the symbol in tvm side if possible). Note that the same problem will appear in the future if we really make an attempt to link pytorch in a deeper integration. This would serve as a way to resolve the possible issue.

RTLD_GLOBAL provides some convenience to give plugin modules(that are loaded later) symbols of libtvm_runtime without explicitly linking to it, we might need to rethink the plugin mechanism(e.g. vta) a bit if we decided to move away from it.