dgl: CURAND_STATUS_INITIALIZATION_FAILED when running dgl.seed() with DGL 1.0+
π Bug
Canβt set dgl.seed() since update.
Essentially, I updated my version of dgl to 1.0 and now when I try to set seed, I get a lowlevel error in the backend (see trace below)
To Reproduce
Steps to reproduce the behavior:
module load miniconda/3
conda create -y -p $SLURM_TMPDIR/env python=3.9
conda activate $SLURM_TMPDIR/env
conda install -y pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -y -c dglteam/label/cu117 dgl
python -c 'import dgl;assert dgl.seed(45)'
Error trace:
>>> import dgl
>>> dgl.seed(45)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/random.py", line 19, in seed
_CAPI_SetSeed(val)
File "dgl/_ffi/_cython/./function.pxi", line 295, in dgl._ffi._cy3.core.FunctionBase.__call__
File "dgl/_ffi/_cython/./function.pxi", line 227, in dgl._ffi._cy3.core.FuncCall
File "dgl/_ffi/_cython/./function.pxi", line 217, in dgl._ffi._cy3.core.FuncCall3
dgl._ffi.base.DGLError: [16:45:00] /opt/dgl/src/random/random.cc:36: Check failed: e == CURAND_STATUS_SUCCESS: CURAND Error: CURAND_STATUS_INITIALIZATION_FAILED at /opt/dgl/src/random/random.cc:36
Stack trace:
[bt] (0) /home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7fb736c2fb7f]
[bt] (1) /home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/libdgl.so(+0x6ad466) [0x7fb736f36466]
[bt] (2) /home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7fb736f43b38]
[bt] (3) /home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/_ffi/_cy3/core.cpython-39-x86_64-linux-gnu.so(+0x1701b) [0x7fb73663201b]
[bt] (4) /home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/_ffi/_cy3/core.cpython-39-x86_64-linux-gnu.so(+0x174fb) [0x7fb7366324fb]
[bt] (5) python(_PyObject_MakeTpCall+0x2ec) [0x4f06ec]
[bt] (6) python(_PyEval_EvalFrameDefault+0x4c74) [0x4ec634]
[bt] (7) python() [0x4f80b3]
[bt] (8) python(_PyEval_EvalFrameDefault+0x4d34) [0x4ec6f4]
Expected behavior
Environment
- DGL Version (e.g., 1.0): 1.0
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch 1.13.1
- OS (e.g., Linux): Linux
- How you installed DGL (
conda,pip, source): conda - Build command you used (if compiling from source):
- Python version: 3.9.16
- CUDA/cuDNN version (if applicable): 11.7
- GPU models and configuration (e.g. V100): V100
- Any other relevant information:
Additional context
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17
Reopening this issue since another user experienced the same error in #5380.
@WMX567 Please follow up here. Could you try installing with pip?