cuml: [BUG] CUDA error using GLOBAL_QUANTILE for split_algo (experimental RF backend)
Describe the bug From the rapids documentation example fitting a RandomForestClassifier on synthetic dataset CUDA error occurs when n_rows is set above exactly 4684:
Works fine using split_algo = 0 (HIST) but is 3 times slower…
Simplest working example
import numpy as np
from cuml.ensemble import RandomForestClassifier as cuRFC
n_rows = 4864 # FAILS ABOVE 4864 -> this looks very much like a bug
X = np.random.normal(size=(n_rows,100)).astype(np.float32)
y = np.asarray([0,1]*(n_rows//2), dtype=np.int32)
cuml_model = cuRFC(max_features=35,
n_bins=8,
n_estimators=400)
%time cuml_model.fit(X,y)
cuml_predict = cuml_model.predict(X)
RuntimeError Traceback (most recent call last)
<timed exec> in <module>
~/anaconda3/envs/rapids-0.19/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
407 target_val=target_val)
408
--> 409 return func(*args, **kwargs)
410
411 @ wraps(func)
cuml/ensemble/randomforestclassifier.pyx in cuml.ensemble.randomforestclassifier.RandomForestClassifier.fit()
RuntimeError: CUDA error encountered at: file=../src/decisiontree/quantile/**quantile.cuh line=236:** call='cub::**DeviceRadixSort::SortKeys(** (void *)d_temp_storage->data(), temp_storage_bytes, &data[col_offset], single_column_sorted->data(), n_rows, 0, 8 * sizeof(T), stream)', **Reason=cudaErrorInvalidValue:invalid argument**
Obtained 64 stack frames
#0 in /home/oleg/anaconda3/envs/rapids-0.19/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft9exception18collect_call_stackEv+0x46) [0x7f45884d5076]
#1 in /home/oleg/anaconda3/envs/rapids-0.19/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft10cuda_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x69) [0x7f45884d57d9]
#2 in /home/oleg/anaconda3/envs/rapids-0.19/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN2ML12DecisionTree16computeQuantilesIfEEvPT_iPKS2_iiSt10shared_ptrIN4raft2mr6device9allocatorEEP11CUstream_st+0x778) [0x7f458889f0d8]
Steps/Code to reproduce bug Running the example linked above by setting n_samples > 4864
Expected behavior model fits using the new backend (split_algo = 1)
Environment details (please complete the following information):
- Environment location: Desktop
- Linux Distro/Architecture: Ubuntu 20.04 amd64
- GPU Model/Driver: GTX 1050 2gb and gtx 1070 ti (msi aero and msi gaming versions) driver 450/460.27/465
- CUDA: 11/11.2/11.3
- Method of cuDF & cuML install: conda
If method of install is [conda], run `conda list` and include results here:
# Name Version Build Channel
_ipyw_jlab_nb_ext_conf 0.1.0 py38_0
_libgcc_mutex 0.1 main
alabaster 0.7.12 pyhd3eb1b0_0
anaconda 2021.05 py38_0
anaconda-client 1.7.2 py38_0
anaconda-navigator 2.0.3 py38_0
anaconda-project 0.9.1 pyhd3eb1b0_1
anyio 2.2.0 py38h06a4308_1
appdirs 1.4.4 py_0
argh 0.26.2 py38_0
argon2-cffi 20.1.0 py38h27cfd23_1
asn1crypto 1.4.0 py_0
astroid 2.5 py38h06a4308_1
astropy 4.2.1 py38h27cfd23_1
async_generator 1.10 pyhd3eb1b0_0
atomicwrites 1.4.0 py_0
attrs 20.3.0 pyhd3eb1b0_0
autopep8 1.5.6 pyhd3eb1b0_0
babel 2.9.0 pyhd3eb1b0_0
backcall 0.2.0 pyhd3eb1b0_0
backports 1.0 pyhd3eb1b0_2
backports.functools_lru_cache 1.6.4 pyhd3eb1b0_0
backports.shutil_get_terminal_size 1.0.0 pyhd3eb1b0_3
backports.tempfile 1.0 pyhd3eb1b0_1
backports.weakref 1.0.post1 py_1
beautifulsoup4 4.9.3 pyha847dfd_0
bitarray 2.1.0 py38h27cfd23_1
bkcharts 0.2 py38_0
black 19.10b0 py_0
blas 1.0 mkl
bleach 3.3.0 pyhd3eb1b0_0
blosc 1.21.0 h8c45485_0
bokeh 2.3.2 py38h06a4308_0
boto 2.49.0 py38_0
bottleneck 1.3.2 py38heb32a55_1
brotlipy 0.7.0 py38h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
c-ares 1.17.1 h27cfd23_0
ca-certificates 2021.4.13 h06a4308_1
cairo 1.16.0 hf32fb01_1
certifi 2020.12.5 py38h06a4308_0
cffi 1.14.5 py38h261ae71_0
chardet 4.0.0 py38h06a4308_1003
click 7.1.2 pyhd3eb1b0_0
cloudpickle 1.6.0 py_0
clyent 1.2.2 py38_1
colorama 0.4.4 pyhd3eb1b0_0
conda 4.10.1 py38h06a4308_1
conda-build 3.21.4 py38h06a4308_0
conda-content-trust 0.1.1 pyhd3eb1b0_0
conda-env 2.6.0 1
conda-package-handling 1.7.3 py38h27cfd23_1
conda-repo-cli 1.0.4 pyhd3eb1b0_0
conda-token 0.3.0 pyhd3eb1b0_0
conda-verify 3.4.2 py_1
contextlib2 0.6.0.post1 py_0
cryptography 3.4.7 py38hd23ed53_0
curl 7.71.1 hbc83047_1
cycler 0.10.0 py38_0
cython 0.29.23 py38h2531618_0
cytoolz 0.11.0 py38h7b6447c_0
dask 2021.4.0 pyhd3eb1b0_0
dask-core 2021.4.0 pyhd3eb1b0_0
dbus 1.13.18 hb2f20db_0
decorator 5.0.6 pyhd3eb1b0_0
defusedxml 0.7.1 pyhd3eb1b0_0
diff-match-patch 20200713 py_0
distributed 2021.4.1 py38h06a4308_0
docutils 0.17.1 py38h06a4308_1
entrypoints 0.3 py38_0
et_xmlfile 1.0.1 py_1001
expat 2.3.0 h2531618_2
fastcache 1.1.0 py38h7b6447c_0
filelock 3.0.12 pyhd3eb1b0_1
flake8 3.9.0 pyhd3eb1b0_0
flask 1.1.2 pyhd3eb1b0_0
fontconfig 2.13.1 h6c09931_0
freetype 2.10.4 h5ab3b9f_0
fribidi 1.0.10 h7b6447c_0
fsspec 0.9.0 pyhd3eb1b0_0
future 0.18.2 py38_1
get_terminal_size 1.0.0 haa9412d_0
gevent 21.1.2 py38h27cfd23_1
glib 2.68.1 h36276a3_0
glob2 0.7 pyhd3eb1b0_0
gmp 6.2.1 h2531618_2
gmpy2 2.0.8 py38hd5f6e3b_3
graphite2 1.3.14 h23475e2_0
greenlet 1.0.0 py38h2531618_2
gst-plugins-base 1.14.0 h8213a91_2
gstreamer 1.14.0 h28cd5cc_2
h5py 2.10.0 py38h7918eee_0
harfbuzz 2.8.0 h6f93f22_0
hdf5 1.10.4 hb1b8bf9_0
heapdict 1.0.1 py_0
html5lib 1.1 py_0
icu 58.2 he6710b0_3
idna 2.10 pyhd3eb1b0_0
imageio 2.9.0 pyhd3eb1b0_0
imagesize 1.2.0 pyhd3eb1b0_0
importlib-metadata 3.10.0 py38h06a4308_0
importlib_metadata 3.10.0 hd3eb1b0_0
iniconfig 1.1.1 pyhd3eb1b0_0
intel-openmp 2021.2.0 h06a4308_610
intervaltree 3.1.0 py_0
ipykernel 5.3.4 py38h5ca1d4c_0
ipython 7.22.0 py38hb070fc8_0
ipython_genutils 0.2.0 pyhd3eb1b0_1
ipywidgets 7.6.3 pyhd3eb1b0_1
isort 5.8.0 pyhd3eb1b0_0
itsdangerous 1.1.0 pyhd3eb1b0_0
jbig 2.1 hdba287a_0
jdcal 1.4.1 py_0
jedi 0.17.2 py38h06a4308_1
jeepney 0.6.0 pyhd3eb1b0_0
jinja2 2.11.3 pyhd3eb1b0_0
joblib 1.0.1 pyhd3eb1b0_0
jpeg 9b h024ee3a_2
json5 0.9.5 py_0
jsonschema 3.2.0 py_2
jupyter 1.0.0 py38_7
jupyter-packaging 0.7.12 pyhd3eb1b0_0
jupyter_client 6.1.12 pyhd3eb1b0_0
jupyter_console 6.4.0 pyhd3eb1b0_0
jupyter_core 4.7.1 py38h06a4308_0
jupyter_server 1.4.1 py38h06a4308_0
jupyterlab 3.0.14 pyhd3eb1b0_1
jupyterlab_pygments 0.1.2 py_0
jupyterlab_server 2.4.0 pyhd3eb1b0_0
jupyterlab_widgets 1.0.0 pyhd3eb1b0_1
keyring 22.3.0 py38h06a4308_0
kiwisolver 1.3.1 py38h2531618_0
krb5 1.18.2 h173b8e3_0
lazy-object-proxy 1.6.0 py38h27cfd23_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.33.1 h53a641e_7
libarchive 3.4.2 h62408e4_0
libcurl 7.71.1 h20c2e04_1
libedit 3.1.20210216 h27cfd23_1
libev 4.33 h7b6447c_0
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
liblief 0.10.1 he6710b0_0
libllvm10 10.0.1 hbcb73fb_5
libpng 1.6.37 hbc83047_0
libsodium 1.0.18 h7b6447c_0
libspatialindex 1.9.3 h2531618_0
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.2.0 h85742a9_0
libtool 2.4.6 h7b6447c_1005
libuuid 1.0.3 h1bed415_2
libuv 1.40.0 h7b6447c_0
libwebp-base 1.2.0 h27cfd23_0
libxcb 1.14 h7b6447c_0
libxml2 2.9.10 hb55368b_3
libxslt 1.1.34 hc22bd24_0
llvmlite 0.36.0 py38h612dafd_4
locket 0.2.1 py38h06a4308_1
lxml 4.6.3 py38h9120a33_0
lz4-c 1.9.3 h2531618_0
lzo 2.10 h7b6447c_2
markupsafe 1.1.1 py38h7b6447c_0
matplotlib 3.3.4 py38h06a4308_0
matplotlib-base 3.3.4 py38h62a2d02_0
mccabe 0.6.1 py38_1
mistune 0.8.4 py38h7b6447c_1000
mkl 2021.2.0 h06a4308_296
mkl-service 2.3.0 py38h27cfd23_1
mkl_fft 1.3.0 py38h42c9631_2
mkl_random 1.2.1 py38ha9443f7_2
mock 4.0.3 pyhd3eb1b0_0
more-itertools 8.7.0 pyhd3eb1b0_0
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.2 hb69a4c5_1
mpmath 1.2.1 py38h06a4308_0
msgpack-python 1.0.2 py38hff7bd54_1
multipledispatch 0.6.0 py38_0
mypy_extensions 0.4.3 py38_0
navigator-updater 0.2.1 py38_0
nbclassic 0.2.6 pyhd3eb1b0_0
nbclient 0.5.3 pyhd3eb1b0_0
nbconvert 6.0.7 py38_0
nbformat 5.1.3 pyhd3eb1b0_0
ncurses 6.2 he6710b0_1
nest-asyncio 1.5.1 pyhd3eb1b0_0
networkx 2.5 py_0
nltk 3.6.1 pyhd3eb1b0_0
nose 1.3.7 pyhd3eb1b0_1006
notebook 6.3.0 py38h06a4308_0
numba 0.53.1 py38ha9443f7_0
numexpr 2.7.3 py38h22e1b3c_1
numpy 1.20.1 py38h93e21f0_0
numpy-base 1.20.1 py38h7d8b39e_0
numpydoc 1.1.0 pyhd3eb1b0_1
olefile 0.46 py_0
openpyxl 3.0.7 pyhd3eb1b0_0
openssl 1.1.1k h27cfd23_0
packaging 20.9 pyhd3eb1b0_0
pandas 1.2.4 py38h2531618_0
pandoc 2.12 h06a4308_0
pandocfilters 1.4.3 py38h06a4308_1
pango 1.45.3 hd140c19_0
parso 0.7.0 py_0
partd 1.2.0 pyhd3eb1b0_0
patchelf 0.12 h2531618_1
path 15.1.2 py38h06a4308_0
path.py 12.5.0 0
pathlib2 2.3.5 py38h06a4308_2
pathspec 0.7.0 py_0
patsy 0.5.1 py38_0
pcre 8.44 he6710b0_0
pep8 1.7.1 py38_0
pexpect 4.8.0 pyhd3eb1b0_3
pickleshare 0.7.5 pyhd3eb1b0_1003
pillow 8.2.0 py38he98fc37_0
pip 21.0.1 py38h06a4308_0
pixman 0.40.0 h7b6447c_0
pkginfo 1.7.0 py38h06a4308_0
pluggy 0.13.1 py38h06a4308_0
ply 3.11 py38_0
prometheus_client 0.10.1 pyhd3eb1b0_0
prompt-toolkit 3.0.17 pyh06a4308_0
prompt_toolkit 3.0.17 hd3eb1b0_0
psutil 5.8.0 py38h27cfd23_1
ptyprocess 0.7.0 pyhd3eb1b0_2
py 1.10.0 pyhd3eb1b0_0
py-lief 0.10.1 py38h403a769_0
pycodestyle 2.6.0 pyhd3eb1b0_0
pycosat 0.6.3 py38h7b6447c_1
pycparser 2.20 py_2
pycurl 7.43.0.6 py38h1ba5d50_0
pydocstyle 6.0.0 pyhd3eb1b0_0
pyerfa 1.7.3 py38h27cfd23_0
pyflakes 2.2.0 pyhd3eb1b0_0
pygments 2.8.1 pyhd3eb1b0_0
pylint 2.7.4 py38h06a4308_1
pyls-black 0.4.6 hd3eb1b0_0
pyls-spyder 0.3.2 pyhd3eb1b0_0
pyodbc 4.0.30 py38he6710b0_0
pyopenssl 20.0.1 pyhd3eb1b0_1
pyparsing 2.4.7 pyhd3eb1b0_0
pyqt 5.9.2 py38h05f1152_4
pyrsistent 0.17.3 py38h7b6447c_0
pysocks 1.7.1 py38h06a4308_0
pytables 3.6.1 py38h9fd0a39_0
pytest 6.2.3 py38h06a4308_2
python 3.8.8 hdb3f193_5
python-dateutil 2.8.1 pyhd3eb1b0_0
python-jsonrpc-server 0.4.0 py_0
python-language-server 0.36.2 pyhd3eb1b0_0
python-libarchive-c 2.9 pyhd3eb1b0_1
pytz 2021.1 pyhd3eb1b0_0
pywavelets 1.1.1 py38h7b6447c_2
pyxdg 0.27 pyhd3eb1b0_0
pyyaml 5.4.1 py38h27cfd23_1
pyzmq 20.0.0 py38h2531618_1
qdarkstyle 2.8.1 py_0
qt 5.9.7 h5867ecd_1
qtawesome 1.0.2 pyhd3eb1b0_0
qtconsole 5.0.3 pyhd3eb1b0_0
qtpy 1.9.0 py_0
readline 8.1 h27cfd23_0
regex 2021.4.4 py38h27cfd23_0
requests 2.25.1 pyhd3eb1b0_0
ripgrep 12.1.1 0
rope 0.18.0 py_0
rtree 0.9.7 py38h06a4308_1
ruamel_yaml 0.15.100 py38h27cfd23_0
scikit-image 0.18.1 py38ha9443f7_0
scikit-learn 0.24.1 py38ha9443f7_0
scipy 1.6.2 py38had2a1c9_1
seaborn 0.11.1 pyhd3eb1b0_0
secretstorage 3.3.1 py38h06a4308_0
send2trash 1.5.0 pyhd3eb1b0_1
setuptools 52.0.0 py38h06a4308_0
simplegeneric 0.8.1 py38_2
singledispatch 3.6.1 pyhd3eb1b0_1001
sip 4.19.13 py38he6710b0_0
six 1.15.0 py38h06a4308_0
sniffio 1.2.0 py38h06a4308_1
snowballstemmer 2.1.0 pyhd3eb1b0_0
sortedcollections 2.1.0 pyhd3eb1b0_0
sortedcontainers 2.3.0 pyhd3eb1b0_0
soupsieve 2.2.1 pyhd3eb1b0_0
sphinx 4.0.1 pyhd3eb1b0_0
sphinxcontrib 1.0 py38_1
sphinxcontrib-applehelp 1.0.2 pyhd3eb1b0_0
sphinxcontrib-devhelp 1.0.2 pyhd3eb1b0_0
sphinxcontrib-htmlhelp 1.0.3 pyhd3eb1b0_0
sphinxcontrib-jsmath 1.0.1 pyhd3eb1b0_0
sphinxcontrib-qthelp 1.0.3 pyhd3eb1b0_0
sphinxcontrib-serializinghtml 1.1.4 pyhd3eb1b0_0
sphinxcontrib-websupport 1.2.4 py_0
spyder 4.2.5 py38h06a4308_0
spyder-kernels 1.10.2 py38h06a4308_0
sqlalchemy 1.4.15 py38h27cfd23_0
sqlite 3.35.4 hdfb4753_0
statsmodels 0.12.2 py38h27cfd23_0
sympy 1.8 py38h06a4308_0
tbb 2020.3 hfd86e86_0
tblib 1.7.0 py_0
terminado 0.9.4 py38h06a4308_0
testpath 0.4.4 pyhd3eb1b0_0
textdistance 4.2.1 pyhd3eb1b0_0
threadpoolctl 2.1.0 pyh5ca1d4c_0
three-merge 0.1.1 pyhd3eb1b0_0
tifffile 2020.10.1 py38hdd07704_2
tk 8.6.10 hbc83047_0
toml 0.10.2 pyhd3eb1b0_0
toolz 0.11.1 pyhd3eb1b0_0
tornado 6.1 py38h27cfd23_0
tqdm 4.59.0 pyhd3eb1b0_1
traitlets 5.0.5 pyhd3eb1b0_0
typed-ast 1.4.2 py38h27cfd23_1
typing_extensions 3.7.4.3 pyha847dfd_0
ujson 4.0.2 py38h2531618_0
unicodecsv 0.14.1 py38_0
unixodbc 2.3.9 h7b6447c_0
urllib3 1.26.4 pyhd3eb1b0_0
watchdog 1.0.2 py38h06a4308_1
wcwidth 0.2.5 py_0
webencodings 0.5.1 py38_1
werkzeug 1.0.1 pyhd3eb1b0_0
wheel 0.36.2 pyhd3eb1b0_0
widgetsnbextension 3.5.1 py38_0
wrapt 1.12.1 py38h7b6447c_1
wurlitzer 2.1.0 py38h06a4308_0
xlrd 2.0.1 pyhd3eb1b0_0
xlsxwriter 1.3.8 pyhd3eb1b0_0
xlwt 1.3.0 py38_0
xmltodict 0.12.0 py_0
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
yapf 0.31.0 pyhd3eb1b0_0
zeromq 4.3.4 h2531618_0
zict 2.0.0 pyhd3eb1b0_0
zipp 3.4.1 pyhd3eb1b0_0
zlib 1.2.11 h7b6447c_3
zope 1.0 py38_1
zope.event 4.5.0 py38_0
zope.interface 5.3.0 py38h27cfd23_0
zstd 1.4.5 h9ceee32_0
Installation procedure:
- Fresh Ubuntu 20.04 install
- Blacklist nouveau drivers
- sudo sh cuda_11.2.0_460.27.04_linux.run
- bash Anaconda3-2021.05-Linux-x86_64.sh
- conda create -n rapids-0.19 -c rapidsai -c nvidia -c conda-forge
rapids-blazing=0.19 python=3.8 cudatoolkit=11.2
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 16 (7 by maintainers)
Managed to compile cuML 20.06 on a fresh Ubuntu install and tested the new backend (split_algo=1) on 2 GTX 1050 -> works like a charm!! (compile ran after deleting env “rapids” but failed later on during compilation).
Thanks a lot @vinaydes and @hcho3 for following-up on this, really appreciated
Hi @Oleg-dM, I had access to a
sm_61device thus I tried to debug the issue. Here are my observations:sm_61devices. Both your GPUs aresm_61thats why you see this issue../build.sh. You can find more here https://github.com/rapidsai/cuml/blob/branch-21.08/BUILD.md.sm_60which should work forsm_61, so it should not really matter. However when I built from source forsm_60(just like pre-built binary) the issue started appearing again. More investigation needed to refine the root cause further.CUDA_CHECK(cudaDeviceSynchronize())just before line quantile.cuh#L75l. The error was still with thecub::DeviceRadixSort::SortKeysfunction. Inside the function, kernelcub::DeviceRadixSortDownsweepKernelseems to throw the error at launch.In short: To @Oleg-dM or anyone else getting affected by this issue could use the workaround described above, while we continue to debug the issue further.