dgl: Error importing DGL with TF backend

🐛 Bug

FileNotFoundError: dgl.dll, even though it exists in the said directory.

To Reproduce

Steps to reproduce the behavior:

  1. Installed tensorflow 2.2 via pip inside a conda environment
  2. Installed dgl via pip inside the same environment
  3. Changed the backend of dgl to tensorflow (I don’t think this has any bearing)

` >>>import dgl Traceback (most recent call last): File “<stdin>”, line 1, in <module> File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_init_.py”, line 8, in <module> from .backend import load_backend, backend_name File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend_init_.py”, line 74, in <module> load_backend(get_preferred_backend()) File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend_init_.py”, line 23, in load_backend mod = importlib.import_module(‘.%s’ % mod_name, name) File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\importlib_init_.py”, line 127, in import_module return _bootstrap.gcd_import(name[level:], package, level) File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend\tensorflow_init.py", line 4, in <module> from .tensor import * File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend\tensorflow\tensor.py”, line 12, in <module> from … import ndarray as nd File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\ndarray.py”, line 14, in <module> from ._ffi.object import register_object, ObjectBase File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\object.py”, line 8, in <module> from .object_generic import ObjectGeneric, convert_to_object File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\object_generic.py”, line 7, in <module> from .base import string_types File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\base.py”, line 42, in <module> _LIB, _LIB_NAME = _load_lib() File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\base.py”, line 34, in load_lib lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL) File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\ctypes_init.py", line 373, in init self._handle = _dlopen(self._name, mode) FileNotFoundError: Could not find module ‘C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\dgl.dll’ (or one of its dependencies). Try using the full path with constructor syntax.

`

Environment

  • DGL Version 0.43 (latest)
  • Backend Library & Version: Tensorflow2.2
  • OS : Windows 10
  • How you installed DGL ( pip, inside a conda env):
  • Python version: 3.8
  • CUDA 10.1

Additional context

After checking the directory for the missing file, it was indeed there! But the error persisted. Conda and lower versions of Python do not support TF 2.2.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 21

Most upvoted comments

Hi,

For windows now dgl support is a bit tricky. Please try the following steps:

  1. Install tf-nightly instead of other tensorflow version. Because the function we needed is only available in the latest nightly build. (And this would be available in tensorflow 2.2 official release)
  2. set the environment variable USE_OFFICIAL_TFDLPACK to true.
import os
os.env['USE_OFFICIAL_TFDLPACK'] = "true"
# then import dgl or other codes
import dgl

Okay, so I’ve figured out what is going wrong here. It actually doesn’t have anything to do with TensorFlow, CUDA, cuDNN, or version mismatches at all. This is caused by changes introduced in Python 3.8 (I’m on 3.9.5). The core of the issue is this, a change in the directories that Python considers by default when looking for DLLs. Notably, starting from this version of Python, the PATH environment variable is no longer included by default (same goes for the current working directory, by the way). A new function is provided to add directories to the list that is searched for DLLs securely. Hacking the following into the start of the _load_lib function of _ffi\base.py fixed it for me:

os.add_dll_directory("C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.1\\bin")
os.add_dll_directory("C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.1\\libnvvp")
os.add_dll_directory("C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.1\\extras\\CUPTI\\lib64")
os.add_dll_directory("C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.1\\include")
os.add_dll_directory("C:\\tools\\cuda\\bin") # cuDNN

It probably doesn’t need all of these, but I just slapped in all related directories that were in my PATH. I’m assuming that TensorFlow accounts for this change in Python functionality, hence the message before the error saying that cudart64_110.dll is loaded. But when it’s DGL’s turn, it doesn’t consider the directories in PATH, meaning it cannot find CUDA and cuDNN. Note that this is also why this was not picked up by the Dependencies application, since that one does appear to check the PATH. This also explans the originally reported issue, which was on Python 3.8.

Probably the clean way to do this is to loop over the directories in PATH when DGL is first loaded, and add relevant directories with that new function one by one. Or introduce a new environment variable to set these directories (or one that informs DGL whether or not to use PATH).

@marijnvk Thanks for your detailed investigation! We’ll check how other frameworks handle this to find a better solution