taichi: [CUDA] detected to be supported and crash on card without unified memory
Describe the bug
CUDA detected to be SUPPORTED on a machine without CUDA.
It’s because is_cuda_api_avaliable returned true even if I don’t have CUDA.
Log/Screenshots
(yuanming-hu/glfw) [bate@archit taichi]$ python examples/mpm128.py
[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-7haz507t
[Taichi] sandbox prepared
[Taichi] <dev mode>, supported archs: [cpu, cuda, opengl], commit 4e2e5605, python 3.8.2
[Hint] Use WSAD/arrow keys to control gravity. Use left/right mouse bottons to attract/repel. Press R to reset.
[W 04/13/20 09:29:21.266] [cuda_driver.h:call_with_warning@60] CUDA Error CUDA_ERROR_INVALID_DEVICE: invalid device ordinal while calling mem_advise (cuMemAdvise)
[E 04/13/20 09:29:21.860] Received signal 7 (Bus error)
***********************************
* Taichi Compiler Stack Traceback *
***********************************
/tmp/taichi-7haz507t/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
/tmp/taichi-7haz507t/taichi_core.so: taichi::signal_handler(int)
/usr/lib/libc.so.6(+0x3bd70) [0x7f359062bd70]
/tmp/taichi-7haz507t/taichi_core.so: taichi::lang::MemoryPool::daemon()
/usr/lib/libstdc++.so.6(+0xcfb24) [0x7f357ff41b24]
/usr/lib/libpthread.so.0(+0x946f) [0x7f359021746f]
/usr/lib/libc.so.6: clone
GNU gdb (GDB) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 8383
[New LWP 8388]
[New LWP 8389]
[New LWP 8390]
[New LWP 8391]
[New LWP 8396]
[New LWP 8397]
[New LWP 8398]
[New LWP 8399]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007f3580fc910c in llvm::Twine::toVector(llvm::SmallVectorImpl<char>&) const ()
from /tmp/taichi-7haz507t/taichi_core.so
(gdb)
To Reproduce
Just run the example/mpm128.py.
If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (3 by maintainers)
Yes, it did.
The
with_cudastill returnstruehowever, according to myTI_INFO("CUDA_DETECTED");.I guess solution 1 is probably easier. Or we can just ask people not to use too many threads when GPU memory is scarce. (Sorry about my delayed reply - workday starts on my end so I have meetings in the morning…)
Solution 1:
device_memory_fraction = 1 / (threads + 1)in test. Solution 2: spinlock until memory enough in test.Does setting envvar
TI_USE_UNIFIED_MEMORY=0fix your problem?