cudf: [BUG] cuDF 10.0 RMM_FREE: __global__ function call is not configured
I conda installed rapids 0.10 but the kernel die when I try to read a parquet file with either cudf.read_parquet or dask_cudf.read_parquet.
conda install -c rapidsai -c nvidia -c conda-forge rapids=0.10 rapids-xgboost dask python=3.7 cudatoolkit=10.0 ipykernel boto3 boto s3fs idna=2.7 PyYAML=3.13 urllib3=1.24.3
Jupyter log error:
terminate called after throwing an instance of 'thrust::system::system_error'
what(): rmm_allocator::deallocate(): RMM_FREE: __global__ function call is not configured
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 17 (14 by maintainers)
If there’s another thrust allocator that throws in its
deallocatemethod then it would have the same throws-within-destructor issue.I believe this error is triggered by an exception that gets thrown in a destructor. If you’re willing to build cudf from source, one way to try to narrow down where the error is occurring is to instrument the
RMM_TRYandCUDA_TRYmacros to log the__FILE__and__LINE__to stderr just before the throw call. Then hopefully it will show where the original error occurs that gets obscured by the thrust system error, and that may shed light onto what the real problem is.Speaking of errors being thrown while cleaning up from an error, there are many places in the code that throw when a CUDA error occurs without clearing the error. As the stack gets unrolled and destructors invoked, any destructor that also checks and throws on a CUDA error is going to trigger this type of issue. Is there a reason to leave the CUDA error pending if the exception being thrown contains the detail of the CUDA error? cc: @harrism @jrhemstad