taichi: pip-installed Taichi crashes on Google colab kernels
Opening an empty CPU-backed notebook at https://colab.research.google.com and running the following code leads to crash:
!apt install clang-7
!apt install clang-format
!pip install taichi-nightly
import taichi as ti
x, y = ti.var(ti.f32), ti.var(ti.f32)
@ti.layout
def xy():
ti.root.dense(ti.ij, 16).place(x, y)
@ti.kernel
def laplace():
for i, j in x:
if (i + j) % 3 == 0:
y[i, j] = 4.0 * x[i, j] - x[i - 1, j] - x[i + 1, j] - x[i, j - 1] - x[i, j + 1]
else:
y[i, j] = 0.0
for i in range(10):
x[i, i + 1] = 1.0
laplace()
for i in range(10):
print(y[i, i + 1])
And the relevant runtime logs say:
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so:
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so:
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::Kernel::operator()()
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::Kernel::compile()
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::Program::compile(taichi::Tlang::Kernel&)
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::KernelCodeGen::compile(taichi::Tlang::Program&, taichi::Tlang::Kernel&)
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::CPUCodeGen::lower_cpp()
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::irpass::lower(taichi::Tlang::IRNode*)
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so: taichi::Tlang::LowerAST::visit(taichi::Tlang::Block*)
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so:
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so:
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/lib/x86_64-linux-gnu/libc.so.6: abort
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/lib/x86_64-linux-gnu/libc.so.6: gsignal
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/lib/x86_64-linux-gnu/libc.so.6:
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m/usr/local/lib/python3.6/dist-packages/taichi/core/../lib/taichi_core.so: taichi::signal_handler(int)
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m***************************
Oct 30, 2019, 3:47:15 PM | WARNING | [0m[35m* Taichi Core Stack Trace *
Oct 30, 2019, 3:47:15 PM | WARNING | [35m***************************
Oct 30, 2019, 3:47:15 PM | WARNING | [E 10/30/19 14:47:15.371] Received signal 6 (Aborted)
Oct 30, 2019, 3:47:15 PM | WARNING | [I 10/30/19 14:47:15.340] [base.cpp:generate_binary@125] Compilation time: 2889.9 ms
Oct 30, 2019, 3:47:12 PM | WARNING | [T 10/30/19 14:47:12.056] [logging.cpp:Logger@67] Taichi core started. Thread ID = 122
Can you please provide some insight into the possible root of the problem if you have it on top of your head?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 2
- Comments: 69 (65 by maintainers)
FINALLY!!! I identified the problem! Colab kernels have a
libtcmalloclibrary installed and env variableLD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4set. Somehow it causeslibstdc++to uselibunwindinstead oflibgcc_sfor stack unwinding on exception. For some reason this causes abort during unwinding complex calls.Running
LD_PRELOAD= python t.py, wheret.pyis some taichi program works, even on GPU kernels. I’m looking for a way to make work inside colab cells as well.@ppwwyyxx Thanks for pointing this out. I agree that closing stale issues using bots is not a good idea, and will prevent further misuse like this.
@znah After some searching, it turns out that we are now blocked at https://github.com/taichi-dev/taichi/issues/1059 - if we can remove all C++ exceptions (which I believe is necessary), then the system will not involve
libunwindand we can run Taichi on colab. It may take some time for people (@sjwsl and @lin-hitonami) to fully removethrow IRModifiedetc. - if you’d like to help that would be awesome!@strongoier I stand corrected, it appears the ‘minimal’ taichi code I was using was incorrect (though the lack of error messages makes things a bit hard to decipher). Apologies for pinging you all, seems to work well now 😃 Excited to try taichi out
FYI and off-topic: this opinion from pytorch author: https://twitter.com/soumithchintala/status/1451213207750721538 may lead the maintainers to reconsider whether it’s a good idea to “auto-close stale issues”. I personally agree with his opinion. What’s more valid (and also used in projects I maintained) is to auto-close invalid issues (e.g. those missing necessary information).
Sorry about that. The bitcode loading issue should be fixed in v0.5.6. The buildbots are currently working on compiling/releasing the new version.
I made a workaround, its pretty ugly, but it makes Taichi run in Colab notebook cells! https://colab.research.google.com/github/znah/notebooks/blob/master/taichi_colab.ipynb https://twitter.com/zzznah/status/1232321076014788608
I’d like to reopen this issue. The problem is still there, and I think supporting colab environment would greatly increase Taichi user adoption.
Interesting observation from the Colab team: Taichi works when using
tcmalloc_minimalinstead oftcmalloc. Relevant bits of documentation:also this
I’m continuing the investigation.
The real way to rectify this issue is to fix a bug somewhere in either clang, or in (nongnu) libunwind, or in tcmalloc. I don’t feel like being capable to do this. I’ll discuss potential solutions with the Colab team.
It’s even trickier. I suspect some ABI incompatibility between
clangandlibunwind, that manifests itself only on unwinding complex virtual calls. So quite few programs are probably affected.