dgl: Error importing DGL with TF backend
🐛 Bug
FileNotFoundError: dgl.dll, even though it exists in the said directory.
To Reproduce
Steps to reproduce the behavior:
- Installed tensorflow 2.2 via pip inside a conda environment
- Installed dgl via pip inside the same environment
- Changed the backend of dgl to tensorflow (I don’t think this has any bearing)
` >>>import dgl Traceback (most recent call last): File “<stdin>”, line 1, in <module> File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_init_.py”, line 8, in <module> from .backend import load_backend, backend_name File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend_init_.py”, line 74, in <module> load_backend(get_preferred_backend()) File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend_init_.py”, line 23, in load_backend mod = importlib.import_module(‘.%s’ % mod_name, name) File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\importlib_init_.py”, line 127, in import_module return _bootstrap.gcd_import(name[level:], package, level) File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend\tensorflow_init.py", line 4, in <module> from .tensor import * File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend\tensorflow\tensor.py”, line 12, in <module> from … import ndarray as nd File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\ndarray.py”, line 14, in <module> from ._ffi.object import register_object, ObjectBase File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\object.py”, line 8, in <module> from .object_generic import ObjectGeneric, convert_to_object File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\object_generic.py”, line 7, in <module> from .base import string_types File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\base.py”, line 42, in <module> _LIB, _LIB_NAME = _load_lib() File “C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\base.py”, line 34, in load_lib lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL) File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\ctypes_init.py", line 373, in init self._handle = _dlopen(self._name, mode) FileNotFoundError: Could not find module ‘C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\dgl.dll’ (or one of its dependencies). Try using the full path with constructor syntax.
`
Environment
- DGL Version 0.43 (latest)
- Backend Library & Version: Tensorflow2.2
- OS : Windows 10
- How you installed DGL (
pip, inside a conda env): - Python version: 3.8
- CUDA 10.1
Additional context
After checking the directory for the missing file, it was indeed there! But the error persisted. Conda and lower versions of Python do not support TF 2.2.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 21
Hi,
For windows now dgl support is a bit tricky. Please try the following steps:
tf-nightlyinstead of other tensorflow version. Because the function we needed is only available in the latest nightly build. (And this would be available in tensorflow 2.2 official release)USE_OFFICIAL_TFDLPACKto true.Okay, so I’ve figured out what is going wrong here. It actually doesn’t have anything to do with TensorFlow, CUDA, cuDNN, or version mismatches at all. This is caused by changes introduced in Python
3.8(I’m on3.9.5). The core of the issue is this, a change in the directories that Python considers by default when looking for DLLs. Notably, starting from this version of Python, thePATHenvironment variable is no longer included by default (same goes for the current working directory, by the way). A new function is provided to add directories to the list that is searched for DLLs securely. Hacking the following into the start of the_load_libfunction of_ffi\base.pyfixed it for me:It probably doesn’t need all of these, but I just slapped in all related directories that were in my
PATH. I’m assuming that TensorFlow accounts for this change in Python functionality, hence the message before the error saying thatcudart64_110.dllis loaded. But when it’s DGL’s turn, it doesn’t consider the directories in PATH, meaning it cannot find CUDA and cuDNN. Note that this is also why this was not picked up by the Dependencies application, since that one does appear to check thePATH. This also explans the originally reported issue, which was on Python3.8.Probably the clean way to do this is to loop over the directories in
PATHwhen DGL is first loaded, and add relevant directories with that new function one by one. Or introduce a new environment variable to set these directories (or one that informs DGL whether or not to usePATH).@marijnvk Thanks for your detailed investigation! We’ll check how other frameworks handle this to find a better solution