cuml: [BUG] terminate called after throwing an instance of 'raft::cuda_error'
terminate called after throwing an instance of ‘raft::cuda_error’ Hi, I’m using cuml.HDBSCAN and the following problem was encountered.
`terminate called after throwing an instance of ‘raft::cuda_error’ what(): CUDA error encountered at: file=_deps/raft-src/cpp/include/raft/cudart_utils.h line=267: call=‘cudaMemcpyAsync(d_ptr1, d_ptr2, len * sizeof(Type), cudaMemcpyDeviceToDevice, stream)’, Reason=cudaErrorInvalidValue:invalid argument Obtained 32 stack frames #0 in /opt/conda/envs/cuml-dev-11.0/lib/libcuml++.so(_ZN4raft9exception18collect_call_stackEv+0x46) [0x7f1bd4f95056] #1 in /opt/conda/envs/cuml-dev-11.0/lib/libcuml++.so(_ZN4raft10cuda_errorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0xc9) [0x7f1bd4f95e39] #2 in /opt/conda/envs/cuml-dev-11.0/lib/libcuml++.so(_ZN4raft10copy_asyncIiEEvPT_PKS1_mN3rmm16cuda_stream_viewE+0x138) [0x7f1bd522f948] #3 in /opt/conda/envs/cuml-dev-11.0/lib/libcuml++.so(_ZN4raft9hierarchy6detail16build_sorted_mstIifN2ML7HDBSCAN22FixConnectivitiesRedOpIifEEEEvRKNS_8handle_tEPKT0_PKT_SF_SC_mmPSD_SG_PSA_SG_mT1_NS_8distance12DistanceTypeEi+0x4c2) [0x7f1bd527e942] #4 in /opt/conda/envs/cuml-dev-11.0/lib/libcuml++.so(_ZN2ML7HDBSCAN13build_linkageIifEEvRKN4raft8handle_tEPKT0_mmNS2_8distance12DistanceTypeERNS0_6Common13HDBSCANParamsERNSB_28robust_single_linkage_outputIT_S6_EE+0x372) [0x7f1bd5281512] #5 in /opt/conda/envs/cuml-dev-11.0/lib/libcuml++.so(_ZN2ML7hdbscanERKN4raft8handle_tEPKfmmNS0_8distance12DistanceTypeERNS_7HDBSCAN6Common13HDBSCANParamsERNS9_14hdbscan_outputIifEE+0x7e) [0x7f1bd521759e] #6 in /opt/conda/envs/cuml-dev-11.0/lib/python3.8/site-packages/cuml/cluster/hdbscan.cpython-38-x86_64-linux-gnu.so(+0x43ec2) [0x7f1de251cec2] #7 in python(PyObject_Call+0x24d) [0x56056760d35d] #8 in python(_PyEval_EvalFrameDefault+0x21bf) [0x5605676b64ef] #9 in python(_PyEval_EvalCodeWithName+0x2c3) [0x560567696db3] #10 in python(PyEval_EvalCodeEx+0x39) [0x560567697e19] #11 in /opt/conda/envs/cuml-dev-11.0/lib/python3.8/site-packages/cuml/cluster/hdbscan.cpython-38-x86_64-linux-gnu.so(+0x2c298) [0x7f1de2505298] #12 in /opt/conda/envs/cuml-dev-11.0/lib/python3.8/site-packages/cuml/cluster/hdbscan.cpython-38-x86_64-linux-gnu.so(+0x2c4f9) [0x7f1de25054f9] #13 in /opt/conda/envs/cuml-dev-11.0/lib/python3.8/site-packages/cuml/cluster/hdbscan.cpython-38-x86_64-linux-gnu.so(+0x3c072) [0x7f1de2515072] #14 in python(PyObject_Call+0x24d) [0x56056760d35d] #15 in python(_PyEval_EvalFrameDefault+0x21bf) [0x5605676b64ef] #16 in python(_PyEval_EvalCodeWithName+0x2c3) [0x560567696db3] #17 in python(+0x1b08b7) [0x5605676988b7] #18 in python(_PyEval_EvalFrameDefault+0x4e03) [0x5605676b9133] #19 in python(_PyFunction_Vectorcall+0x1a6) [0x560567697fc6] #20 in python(_PyEval_EvalFrameDefault+0x947) [0x5605676b4c77] #21 in python(_PyEval_EvalCodeWithName+0x2c3) [0x560567696db3] #22 in python(PyEval_EvalCodeEx+0x39) [0x560567697e19] #23 in python(PyEval_EvalCode+0x1b) [0x56056773a24b] #24 in python(+0x2522e3) [0x56056773a2e3] #25 in python(+0x26e543) [0x560567756543] #26 in python(+0x273562) [0x56056775b562] #27 in python(PyRun_SimpleFileExFlags+0x1b2) [0x56056775b742] #28 in python(Py_RunMain+0x36d) [0x56056775bcbd] #29 in python(Py_BytesMain+0x39) [0x56056775be79] #30 in /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f1fc17a4b97] #31 in python(+0x1e6d69) [0x5605676ced69]
Aborted (core dumped)`
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 21 (10 by maintainers)
@MartinKlefas Have you tried to increase
min_samples? Adding non-zero edges to the KNN should lead to convergence. If you can compute the maximum number of repeated inputs in your dataset and setmin_samplesto be greater than that, it should work.@cjnolet @Brillone Hello, I encountered the same issue where some of the parameter combinations work and some throw the same error(if run as a python script. If I run it on jupyter notebook on vscode then it would give error like
FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use 'hmac-sha256' instead of '"hmac-sha256"' if you require traitlets >=5.)I am using 4.7M samples with dimension of 50.
For Example: Doesn’t work:
model = HDBSCAN(min_cluster_size=1000, min_samples=10)Works:model = HDBSCAN(min_cluster_size=1000, min_samples=5)Please let me know if there’s any update.
Thanks.
Hi, same issue is happening to me with different settings for the HDBSCAN model (some works).
For example it happens with the following parameters:
model = HDBSCAN(min_cluster_size=15, min_samples=10)A setting that did worked:
model = HDBSCAN(min_cluster_size=5, min_samples=5)My dataset has 2.5M samples with 64 dimensions (I can’t provide the dataset).
Duplicates can to some extent be seen as sample weights and and removing them might move your analysis farther away from the underlying ground truth data distribution from which your data is implicitly sampled. I’d probably leave them in.