dgl: CURAND_STATUS_INITIALIZATION_FAILED when running dgl.seed() with DGL 1.0+

πŸ› Bug

Can’t set dgl.seed() since update.

Essentially, I updated my version of dgl to 1.0 and now when I try to set seed, I get a lowlevel error in the backend (see trace below)

To Reproduce

Steps to reproduce the behavior:

module load miniconda/3
conda create -y -p $SLURM_TMPDIR/env python=3.9
conda activate $SLURM_TMPDIR/env
conda install -y pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -y -c dglteam/label/cu117 dgl
python -c 'import dgl;assert dgl.seed(45)'

Error trace:

>>> import dgl 
>>> dgl.seed(45) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/random.py", line 19, in seed
    _CAPI_SetSeed(val)
  File "dgl/_ffi/_cython/./function.pxi", line 295, in dgl._ffi._cy3.core.FunctionBase.__call__
  File "dgl/_ffi/_cython/./function.pxi", line 227, in dgl._ffi._cy3.core.FuncCall
  File "dgl/_ffi/_cython/./function.pxi", line 217, in dgl._ffi._cy3.core.FuncCall3
dgl._ffi.base.DGLError: [16:45:00] /opt/dgl/src/random/random.cc:36: Check failed: e == CURAND_STATUS_SUCCESS: CURAND Error: CURAND_STATUS_INITIALIZATION_FAILED at /opt/dgl/src/random/random.cc:36
Stack trace:
  [bt] (0) /home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7fb736c2fb7f]
  [bt] (1) /home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/libdgl.so(+0x6ad466) [0x7fb736f36466]
  [bt] (2) /home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7fb736f43b38]
  [bt] (3) /home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/_ffi/_cy3/core.cpython-39-x86_64-linux-gnu.so(+0x1701b) [0x7fb73663201b]
  [bt] (4) /home/mila/r/rebecca.salganik/.conda/envs/nsv5/lib/python3.9/site-packages/dgl/_ffi/_cy3/core.cpython-39-x86_64-linux-gnu.so(+0x174fb) [0x7fb7366324fb]
  [bt] (5) python(_PyObject_MakeTpCall+0x2ec) [0x4f06ec]
  [bt] (6) python(_PyEval_EvalFrameDefault+0x4c74) [0x4ec634]
  [bt] (7) python() [0x4f80b3]
  [bt] (8) python(_PyEval_EvalFrameDefault+0x4d34) [0x4ec6f4]

Expected behavior

Environment

  • DGL Version (e.g., 1.0): 1.0
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch 1.13.1
  • OS (e.g., Linux): Linux
  • How you installed DGL (conda, pip, source): conda
  • Build command you used (if compiling from source):
  • Python version: 3.9.16
  • CUDA/cuDNN version (if applicable): 11.7
  • GPU models and configuration (e.g. V100): V100
  • Any other relevant information:

Additional context

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17

Most upvoted comments

Reopening this issue since another user experienced the same error in #5380.

@WMX567 Please follow up here. Could you try installing with pip?