cuml: [BUG] CUDA error using GLOBAL_QUANTILE for split_algo (experimental RF backend)

Describe the bug From the rapids documentation example fitting a RandomForestClassifier on synthetic dataset CUDA error occurs when n_rows is set above exactly 4684:

Works fine using split_algo = 0 (HIST) but is 3 times slower…

Simplest working example

import numpy as np
from cuml.ensemble import RandomForestClassifier as cuRFC

n_rows = 4864 # FAILS ABOVE 4864 -> this looks very much like a bug

X = np.random.normal(size=(n_rows,100)).astype(np.float32)
y = np.asarray([0,1]*(n_rows//2), dtype=np.int32)

cuml_model = cuRFC(max_features=35,
                   n_bins=8,
                   n_estimators=400)

%time cuml_model.fit(X,y)

cuml_predict = cuml_model.predict(X)
RuntimeError                              Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/envs/rapids-0.19/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
    407                                 target_val=target_val)
    408 
--> 409                 return func(*args, **kwargs)
    410 
    411         @ wraps(func)

cuml/ensemble/randomforestclassifier.pyx in cuml.ensemble.randomforestclassifier.RandomForestClassifier.fit()

RuntimeError: CUDA error encountered at: file=../src/decisiontree/quantile/**quantile.cuh line=236:** call='cub::**DeviceRadixSort::SortKeys(** (void *)d_temp_storage->data(), temp_storage_bytes, &data[col_offset], single_column_sorted->data(), n_rows, 0, 8 * sizeof(T), stream)', **Reason=cudaErrorInvalidValue:invalid argument**
Obtained 64 stack frames
#0 in /home/oleg/anaconda3/envs/rapids-0.19/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft9exception18collect_call_stackEv+0x46) [0x7f45884d5076]
#1 in /home/oleg/anaconda3/envs/rapids-0.19/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft10cuda_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x69) [0x7f45884d57d9]
#2 in /home/oleg/anaconda3/envs/rapids-0.19/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN2ML12DecisionTree16computeQuantilesIfEEvPT_iPKS2_iiSt10shared_ptrIN4raft2mr6device9allocatorEEP11CUstream_st+0x778) [0x7f458889f0d8]

Steps/Code to reproduce bug Running the example linked above by setting n_samples > 4864

Expected behavior model fits using the new backend (split_algo = 1)

Environment details (please complete the following information):

  • Environment location: Desktop
  • Linux Distro/Architecture: Ubuntu 20.04 amd64
  • GPU Model/Driver: GTX 1050 2gb and gtx 1070 ti (msi aero and msi gaming versions) driver 450/460.27/465
  • CUDA: 11/11.2/11.3
  • Method of cuDF & cuML install: conda
If method of install is [conda], run `conda list` and include results here:
# Name                    Version                   Build  Channel
_ipyw_jlab_nb_ext_conf    0.1.0                    py38_0  
_libgcc_mutex             0.1                        main  
alabaster                 0.7.12             pyhd3eb1b0_0  
anaconda                  2021.05                  py38_0  
anaconda-client           1.7.2                    py38_0  
anaconda-navigator        2.0.3                    py38_0  
anaconda-project          0.9.1              pyhd3eb1b0_1  
anyio                     2.2.0            py38h06a4308_1  
appdirs                   1.4.4                      py_0  
argh                      0.26.2                   py38_0  
argon2-cffi               20.1.0           py38h27cfd23_1  
asn1crypto                1.4.0                      py_0  
astroid                   2.5              py38h06a4308_1  
astropy                   4.2.1            py38h27cfd23_1  
async_generator           1.10               pyhd3eb1b0_0  
atomicwrites              1.4.0                      py_0  
attrs                     20.3.0             pyhd3eb1b0_0  
autopep8                  1.5.6              pyhd3eb1b0_0  
babel                     2.9.0              pyhd3eb1b0_0  
backcall                  0.2.0              pyhd3eb1b0_0  
backports                 1.0                pyhd3eb1b0_2  
backports.functools_lru_cache 1.6.4              pyhd3eb1b0_0  
backports.shutil_get_terminal_size 1.0.0              pyhd3eb1b0_3  
backports.tempfile        1.0                pyhd3eb1b0_1  
backports.weakref         1.0.post1                  py_1  
beautifulsoup4            4.9.3              pyha847dfd_0  
bitarray                  2.1.0            py38h27cfd23_1  
bkcharts                  0.2                      py38_0  
black                     19.10b0                    py_0  
blas                      1.0                         mkl  
bleach                    3.3.0              pyhd3eb1b0_0  
blosc                     1.21.0               h8c45485_0  
bokeh                     2.3.2            py38h06a4308_0  
boto                      2.49.0                   py38_0  
bottleneck                1.3.2            py38heb32a55_1  
brotlipy                  0.7.0           py38h27cfd23_1003  
bzip2                     1.0.8                h7b6447c_0  
c-ares                    1.17.1               h27cfd23_0  
ca-certificates           2021.4.13            h06a4308_1  
cairo                     1.16.0               hf32fb01_1  
certifi                   2020.12.5        py38h06a4308_0  
cffi                      1.14.5           py38h261ae71_0  
chardet                   4.0.0           py38h06a4308_1003  
click                     7.1.2              pyhd3eb1b0_0  
cloudpickle               1.6.0                      py_0  
clyent                    1.2.2                    py38_1  
colorama                  0.4.4              pyhd3eb1b0_0  
conda                     4.10.1           py38h06a4308_1  
conda-build               3.21.4           py38h06a4308_0  
conda-content-trust       0.1.1              pyhd3eb1b0_0  
conda-env                 2.6.0                         1  
conda-package-handling    1.7.3            py38h27cfd23_1  
conda-repo-cli            1.0.4              pyhd3eb1b0_0  
conda-token               0.3.0              pyhd3eb1b0_0  
conda-verify              3.4.2                      py_1  
contextlib2               0.6.0.post1                py_0  
cryptography              3.4.7            py38hd23ed53_0  
curl                      7.71.1               hbc83047_1  
cycler                    0.10.0                   py38_0  
cython                    0.29.23          py38h2531618_0  
cytoolz                   0.11.0           py38h7b6447c_0  
dask                      2021.4.0           pyhd3eb1b0_0  
dask-core                 2021.4.0           pyhd3eb1b0_0  
dbus                      1.13.18              hb2f20db_0  
decorator                 5.0.6              pyhd3eb1b0_0  
defusedxml                0.7.1              pyhd3eb1b0_0  
diff-match-patch          20200713                   py_0  
distributed               2021.4.1         py38h06a4308_0  
docutils                  0.17.1           py38h06a4308_1  
entrypoints               0.3                      py38_0  
et_xmlfile                1.0.1                   py_1001  
expat                     2.3.0                h2531618_2  
fastcache                 1.1.0            py38h7b6447c_0  
filelock                  3.0.12             pyhd3eb1b0_1  
flake8                    3.9.0              pyhd3eb1b0_0  
flask                     1.1.2              pyhd3eb1b0_0  
fontconfig                2.13.1               h6c09931_0  
freetype                  2.10.4               h5ab3b9f_0  
fribidi                   1.0.10               h7b6447c_0  
fsspec                    0.9.0              pyhd3eb1b0_0  
future                    0.18.2                   py38_1  
get_terminal_size         1.0.0                haa9412d_0  
gevent                    21.1.2           py38h27cfd23_1  
glib                      2.68.1               h36276a3_0  
glob2                     0.7                pyhd3eb1b0_0  
gmp                       6.2.1                h2531618_2  
gmpy2                     2.0.8            py38hd5f6e3b_3  
graphite2                 1.3.14               h23475e2_0  
greenlet                  1.0.0            py38h2531618_2  
gst-plugins-base          1.14.0               h8213a91_2  
gstreamer                 1.14.0               h28cd5cc_2  
h5py                      2.10.0           py38h7918eee_0  
harfbuzz                  2.8.0                h6f93f22_0  
hdf5                      1.10.4               hb1b8bf9_0  
heapdict                  1.0.1                      py_0  
html5lib                  1.1                        py_0  
icu                       58.2                 he6710b0_3  
idna                      2.10               pyhd3eb1b0_0  
imageio                   2.9.0              pyhd3eb1b0_0  
imagesize                 1.2.0              pyhd3eb1b0_0  
importlib-metadata        3.10.0           py38h06a4308_0  
importlib_metadata        3.10.0               hd3eb1b0_0  
iniconfig                 1.1.1              pyhd3eb1b0_0  
intel-openmp              2021.2.0           h06a4308_610  
intervaltree              3.1.0                      py_0  
ipykernel                 5.3.4            py38h5ca1d4c_0  
ipython                   7.22.0           py38hb070fc8_0  
ipython_genutils          0.2.0              pyhd3eb1b0_1  
ipywidgets                7.6.3              pyhd3eb1b0_1  
isort                     5.8.0              pyhd3eb1b0_0  
itsdangerous              1.1.0              pyhd3eb1b0_0  
jbig                      2.1                  hdba287a_0  
jdcal                     1.4.1                      py_0  
jedi                      0.17.2           py38h06a4308_1  
jeepney                   0.6.0              pyhd3eb1b0_0  
jinja2                    2.11.3             pyhd3eb1b0_0  
joblib                    1.0.1              pyhd3eb1b0_0  
jpeg                      9b                   h024ee3a_2  
json5                     0.9.5                      py_0  
jsonschema                3.2.0                      py_2  
jupyter                   1.0.0                    py38_7  
jupyter-packaging         0.7.12             pyhd3eb1b0_0  
jupyter_client            6.1.12             pyhd3eb1b0_0  
jupyter_console           6.4.0              pyhd3eb1b0_0  
jupyter_core              4.7.1            py38h06a4308_0  
jupyter_server            1.4.1            py38h06a4308_0  
jupyterlab                3.0.14             pyhd3eb1b0_1  
jupyterlab_pygments       0.1.2                      py_0  
jupyterlab_server         2.4.0              pyhd3eb1b0_0  
jupyterlab_widgets        1.0.0              pyhd3eb1b0_1  
keyring                   22.3.0           py38h06a4308_0  
kiwisolver                1.3.1            py38h2531618_0  
krb5                      1.18.2               h173b8e3_0  
lazy-object-proxy         1.6.0            py38h27cfd23_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libarchive                3.4.2                h62408e4_0  
libcurl                   7.71.1               h20c2e04_1  
libedit                   3.1.20210216         h27cfd23_1  
libev                     4.33                 h7b6447c_0  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
liblief                   0.10.1               he6710b0_0  
libllvm10                 10.0.1               hbcb73fb_5  
libpng                    1.6.37               hbc83047_0  
libsodium                 1.0.18               h7b6447c_0  
libspatialindex           1.9.3                h2531618_0  
libssh2                   1.9.0                h1ba5d50_1  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.2.0                h85742a9_0  
libtool                   2.4.6             h7b6447c_1005  
libuuid                   1.0.3                h1bed415_2  
libuv                     1.40.0               h7b6447c_0  
libwebp-base              1.2.0                h27cfd23_0  
libxcb                    1.14                 h7b6447c_0  
libxml2                   2.9.10               hb55368b_3  
libxslt                   1.1.34               hc22bd24_0  
llvmlite                  0.36.0           py38h612dafd_4  
locket                    0.2.1            py38h06a4308_1  
lxml                      4.6.3            py38h9120a33_0  
lz4-c                     1.9.3                h2531618_0  
lzo                       2.10                 h7b6447c_2  
markupsafe                1.1.1            py38h7b6447c_0  
matplotlib                3.3.4            py38h06a4308_0  
matplotlib-base           3.3.4            py38h62a2d02_0  
mccabe                    0.6.1                    py38_1  
mistune                   0.8.4           py38h7b6447c_1000  
mkl                       2021.2.0           h06a4308_296  
mkl-service               2.3.0            py38h27cfd23_1  
mkl_fft                   1.3.0            py38h42c9631_2  
mkl_random                1.2.1            py38ha9443f7_2  
mock                      4.0.3              pyhd3eb1b0_0  
more-itertools            8.7.0              pyhd3eb1b0_0  
mpc                       1.1.0                h10f8cd9_1  
mpfr                      4.0.2                hb69a4c5_1  
mpmath                    1.2.1            py38h06a4308_0  
msgpack-python            1.0.2            py38hff7bd54_1  
multipledispatch          0.6.0                    py38_0  
mypy_extensions           0.4.3                    py38_0  
navigator-updater         0.2.1                    py38_0  
nbclassic                 0.2.6              pyhd3eb1b0_0  
nbclient                  0.5.3              pyhd3eb1b0_0  
nbconvert                 6.0.7                    py38_0  
nbformat                  5.1.3              pyhd3eb1b0_0  
ncurses                   6.2                  he6710b0_1  
nest-asyncio              1.5.1              pyhd3eb1b0_0  
networkx                  2.5                        py_0  
nltk                      3.6.1              pyhd3eb1b0_0  
nose                      1.3.7           pyhd3eb1b0_1006  
notebook                  6.3.0            py38h06a4308_0  
numba                     0.53.1           py38ha9443f7_0  
numexpr                   2.7.3            py38h22e1b3c_1  
numpy                     1.20.1           py38h93e21f0_0  
numpy-base                1.20.1           py38h7d8b39e_0  
numpydoc                  1.1.0              pyhd3eb1b0_1  
olefile                   0.46                       py_0  
openpyxl                  3.0.7              pyhd3eb1b0_0  
openssl                   1.1.1k               h27cfd23_0  
packaging                 20.9               pyhd3eb1b0_0  
pandas                    1.2.4            py38h2531618_0  
pandoc                    2.12                 h06a4308_0  
pandocfilters             1.4.3            py38h06a4308_1  
pango                     1.45.3               hd140c19_0  
parso                     0.7.0                      py_0  
partd                     1.2.0              pyhd3eb1b0_0  
patchelf                  0.12                 h2531618_1  
path                      15.1.2           py38h06a4308_0  
path.py                   12.5.0                        0  
pathlib2                  2.3.5            py38h06a4308_2  
pathspec                  0.7.0                      py_0  
patsy                     0.5.1                    py38_0  
pcre                      8.44                 he6710b0_0  
pep8                      1.7.1                    py38_0  
pexpect                   4.8.0              pyhd3eb1b0_3  
pickleshare               0.7.5           pyhd3eb1b0_1003  
pillow                    8.2.0            py38he98fc37_0  
pip                       21.0.1           py38h06a4308_0  
pixman                    0.40.0               h7b6447c_0  
pkginfo                   1.7.0            py38h06a4308_0  
pluggy                    0.13.1           py38h06a4308_0  
ply                       3.11                     py38_0  
prometheus_client         0.10.1             pyhd3eb1b0_0  
prompt-toolkit            3.0.17             pyh06a4308_0  
prompt_toolkit            3.0.17               hd3eb1b0_0  
psutil                    5.8.0            py38h27cfd23_1  
ptyprocess                0.7.0              pyhd3eb1b0_2  
py                        1.10.0             pyhd3eb1b0_0  
py-lief                   0.10.1           py38h403a769_0  
pycodestyle               2.6.0              pyhd3eb1b0_0  
pycosat                   0.6.3            py38h7b6447c_1  
pycparser                 2.20                       py_2  
pycurl                    7.43.0.6         py38h1ba5d50_0  
pydocstyle                6.0.0              pyhd3eb1b0_0  
pyerfa                    1.7.3            py38h27cfd23_0  
pyflakes                  2.2.0              pyhd3eb1b0_0  
pygments                  2.8.1              pyhd3eb1b0_0  
pylint                    2.7.4            py38h06a4308_1  
pyls-black                0.4.6                hd3eb1b0_0  
pyls-spyder               0.3.2              pyhd3eb1b0_0  
pyodbc                    4.0.30           py38he6710b0_0  
pyopenssl                 20.0.1             pyhd3eb1b0_1  
pyparsing                 2.4.7              pyhd3eb1b0_0  
pyqt                      5.9.2            py38h05f1152_4  
pyrsistent                0.17.3           py38h7b6447c_0  
pysocks                   1.7.1            py38h06a4308_0  
pytables                  3.6.1            py38h9fd0a39_0  
pytest                    6.2.3            py38h06a4308_2  
python                    3.8.8                hdb3f193_5  
python-dateutil           2.8.1              pyhd3eb1b0_0  
python-jsonrpc-server     0.4.0                      py_0  
python-language-server    0.36.2             pyhd3eb1b0_0  
python-libarchive-c       2.9                pyhd3eb1b0_1  
pytz                      2021.1             pyhd3eb1b0_0  
pywavelets                1.1.1            py38h7b6447c_2  
pyxdg                     0.27               pyhd3eb1b0_0  
pyyaml                    5.4.1            py38h27cfd23_1  
pyzmq                     20.0.0           py38h2531618_1  
qdarkstyle                2.8.1                      py_0  
qt                        5.9.7                h5867ecd_1  
qtawesome                 1.0.2              pyhd3eb1b0_0  
qtconsole                 5.0.3              pyhd3eb1b0_0  
qtpy                      1.9.0                      py_0  
readline                  8.1                  h27cfd23_0  
regex                     2021.4.4         py38h27cfd23_0  
requests                  2.25.1             pyhd3eb1b0_0  
ripgrep                   12.1.1                        0  
rope                      0.18.0                     py_0  
rtree                     0.9.7            py38h06a4308_1  
ruamel_yaml               0.15.100         py38h27cfd23_0  
scikit-image              0.18.1           py38ha9443f7_0  
scikit-learn              0.24.1           py38ha9443f7_0  
scipy                     1.6.2            py38had2a1c9_1  
seaborn                   0.11.1             pyhd3eb1b0_0  
secretstorage             3.3.1            py38h06a4308_0  
send2trash                1.5.0              pyhd3eb1b0_1  
setuptools                52.0.0           py38h06a4308_0  
simplegeneric             0.8.1                    py38_2  
singledispatch            3.6.1           pyhd3eb1b0_1001  
sip                       4.19.13          py38he6710b0_0  
six                       1.15.0           py38h06a4308_0  
sniffio                   1.2.0            py38h06a4308_1  
snowballstemmer           2.1.0              pyhd3eb1b0_0  
sortedcollections         2.1.0              pyhd3eb1b0_0  
sortedcontainers          2.3.0              pyhd3eb1b0_0  
soupsieve                 2.2.1              pyhd3eb1b0_0  
sphinx                    4.0.1              pyhd3eb1b0_0  
sphinxcontrib             1.0                      py38_1  
sphinxcontrib-applehelp   1.0.2              pyhd3eb1b0_0  
sphinxcontrib-devhelp     1.0.2              pyhd3eb1b0_0  
sphinxcontrib-htmlhelp    1.0.3              pyhd3eb1b0_0  
sphinxcontrib-jsmath      1.0.1              pyhd3eb1b0_0  
sphinxcontrib-qthelp      1.0.3              pyhd3eb1b0_0  
sphinxcontrib-serializinghtml 1.1.4              pyhd3eb1b0_0  
sphinxcontrib-websupport  1.2.4                      py_0  
spyder                    4.2.5            py38h06a4308_0  
spyder-kernels            1.10.2           py38h06a4308_0  
sqlalchemy                1.4.15           py38h27cfd23_0  
sqlite                    3.35.4               hdfb4753_0  
statsmodels               0.12.2           py38h27cfd23_0  
sympy                     1.8              py38h06a4308_0  
tbb                       2020.3               hfd86e86_0  
tblib                     1.7.0                      py_0  
terminado                 0.9.4            py38h06a4308_0  
testpath                  0.4.4              pyhd3eb1b0_0  
textdistance              4.2.1              pyhd3eb1b0_0  
threadpoolctl             2.1.0              pyh5ca1d4c_0  
three-merge               0.1.1              pyhd3eb1b0_0  
tifffile                  2020.10.1        py38hdd07704_2  
tk                        8.6.10               hbc83047_0  
toml                      0.10.2             pyhd3eb1b0_0  
toolz                     0.11.1             pyhd3eb1b0_0  
tornado                   6.1              py38h27cfd23_0  
tqdm                      4.59.0             pyhd3eb1b0_1  
traitlets                 5.0.5              pyhd3eb1b0_0  
typed-ast                 1.4.2            py38h27cfd23_1  
typing_extensions         3.7.4.3            pyha847dfd_0  
ujson                     4.0.2            py38h2531618_0  
unicodecsv                0.14.1                   py38_0  
unixodbc                  2.3.9                h7b6447c_0  
urllib3                   1.26.4             pyhd3eb1b0_0  
watchdog                  1.0.2            py38h06a4308_1  
wcwidth                   0.2.5                      py_0  
webencodings              0.5.1                    py38_1  
werkzeug                  1.0.1              pyhd3eb1b0_0  
wheel                     0.36.2             pyhd3eb1b0_0  
widgetsnbextension        3.5.1                    py38_0  
wrapt                     1.12.1           py38h7b6447c_1  
wurlitzer                 2.1.0            py38h06a4308_0  
xlrd                      2.0.1              pyhd3eb1b0_0  
xlsxwriter                1.3.8              pyhd3eb1b0_0  
xlwt                      1.3.0                    py38_0  
xmltodict                 0.12.0                     py_0  
xz                        5.2.5                h7b6447c_0  
yaml                      0.2.5                h7b6447c_0  
yapf                      0.31.0             pyhd3eb1b0_0  
zeromq                    4.3.4                h2531618_0  
zict                      2.0.0              pyhd3eb1b0_0  
zipp                      3.4.1              pyhd3eb1b0_0  
zlib                      1.2.11               h7b6447c_3  
zope                      1.0                      py38_1  
zope.event                4.5.0                    py38_0  
zope.interface            5.3.0            py38h27cfd23_0  
zstd                      1.4.5                h9ceee32_0 

Installation procedure:

  1. Fresh Ubuntu 20.04 install
  2. Blacklist nouveau drivers
  3. sudo sh cuda_11.2.0_460.27.04_linux.run
  4. bash Anaconda3-2021.05-Linux-x86_64.sh
  5. conda create -n rapids-0.19 -c rapidsai -c nvidia -c conda-forge
    rapids-blazing=0.19 python=3.8 cudatoolkit=11.2

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

Managed to compile cuML 20.06 on a fresh Ubuntu install and tested the new backend (split_algo=1) on 2 GTX 1050 -> works like a charm!! (compile ran after deleting env “rapids” but failed later on during compilation).

Thanks a lot @vinaydes and @hcho3 for following-up on this, really appreciated

Hi @Oleg-dM, I had access to a sm_61 device thus I tried to debug the issue. Here are my observations:

  1. The issue is specific to sm_61 devices. Both your GPUs are sm_61 thats why you see this issue.
  2. Workaround The issue appears only when you install pre-built libcuml from conda channel. If you build from source the issue goes away. Building from source is not super complicated either. All it takes is creating conda environment and invoking ./build.sh. You can find more here https://github.com/rapidsai/cuml/blob/branch-21.08/BUILD.md.
  3. I am currently not sure what is the reason for such a difference between pre-built vs built from source. A key difference between pre-built and built from source is regarding which cuda PTX objects are present in the libcuml. Pre-built has PTX for sm_60 which should work for sm_61, so it should not really matter. However when I built from source for sm_60 (just like pre-built binary) the issue started appearing again. More investigation needed to refine the root cause further.
  4. @dumerrill To eliminate stale errors I added CUDA_CHECK(cudaDeviceSynchronize()) just before line quantile.cuh#L75l. The error was still with the cub::DeviceRadixSort::SortKeys function. Inside the function, kernel cub::DeviceRadixSortDownsweepKernel seems to throw the error at launch.
  5. I could reproduce the error with C++ benchmarking code from cuML, which speeds up the process of debugging. However unlike Python example which fails every time, C++ one fails intermittently.

In short: To @Oleg-dM or anyone else getting affected by this issue could use the workaround described above, while we continue to debug the issue further.