ROCm: ROCm 3.7 installation missing libraries for Tensorflow-rocm
This is a new installation of ROCm 3.7 and ubuntu 20.04.1. In previous version of ROCm (and ubuntu) I did not have these issues, but now with 3.7 (could have happened with earlier versions as well) I often found missing libraries here and there.
For example, after install ROCm, and also install rocm-libs and tensorflow-rocm (via pip3), I still cannot import tensorflow:
>>> import tensorflow
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.8/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: librccl.so.1: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/dist-packages/tensorflow/__init__.py", line 41, in <module>
from tensorflow.python.tools import module_util as _module_util
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/__init__.py", line 50, in <module>
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 69, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.8/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: librccl.so.1: cannot open shared object file: No such file or directory
Should they already be included in the installation of ROCm/Tensorflow-rocm?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 25 (8 by maintainers)
Commits related to this issue
- Fix compatibility with ROCm>=3.5.1; fix typo in hip neighbor_list (#2012) Fixes #1400. Fixes #2009. 1. Uses cmake native module `CMakeDetermineHIPCompiler` to find the search path; 2. for ROCm>=... — committed to deepmodeling/deepmd-kit by njzjz 2 years ago
- Fix compatibility with ROCm>=3.5.1; fix typo in hip neighbor_list (#2012) Fixes #1400. Fixes #2009. 1. Uses cmake native module `CMakeDetermineHIPCompiler` to find the search path; 2. for ROCm>=3.5.... — committed to mingzhong15/deepmd-kit by njzjz 2 years ago
The tensorflow-rocm-2.3.0 had released two hours ago, seems support rocm-3.7.0 officially. It used libamdhip64.so beside the old libhip_hcc.so, we neednot create symbolic link now.
I compiled pytorch with latest update, and pytorch could run mnist with rocm-3.7.0.
But I cannot compiled tensorflow-upstream successfully, now
Thanks, I guess I will just let the right people fix their bugs first. Together with attempts to fix the AMD Reset Bug with my Radeon VII, I have wasted a really long time on AMD’s stuff. I hate to say but my next GPU is more like a Nvidia.
Install rocsparse and rccl.
sudo apt install rocsparse rccland update ldconfigsudo ldconfigThen tf will report that it cannot be find libhip_hcc.so.export LD_LIBRARY_PATH=/opt/rocm/amop/libThen tf will crash with a ‘stack smashing detected’ error, I think we should wait for a new version of tensorflow-rocm for supporting rocm-3.7.0.By the way, pytorch crashed with same error. I cannot understand why rocm-3.7.0 cannot be compatible with tf and pytorch.
Changes pushed and rccl will be part of rocm-libs from ROCm 4.1 onwards. Thank you.