ROCm: ROCm 3.7 installation missing libraries for Tensorflow-rocm

This is a new installation of ROCm 3.7 and ubuntu 20.04.1. In previous version of ROCm (and ubuntu) I did not have these issues, but now with 3.7 (could have happened with earlier versions as well) I often found missing libraries here and there.

For example, after install ROCm, and also install rocm-libs and tensorflow-rocm (via pip3), I still cannot import tensorflow:

>>> import tensorflow
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.8/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: librccl.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/__init__.py", line 50, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 69, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.8/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: librccl.so.1: cannot open shared object file: No such file or directory

Should they already be included in the installation of ROCm/Tensorflow-rocm?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 25 (8 by maintainers)

Commits related to this issue

Most upvoted comments

The tensorflow-rocm-2.3.0 had released two hours ago, seems support rocm-3.7.0 officially. It used libamdhip64.so beside the old libhip_hcc.so, we neednot create symbolic link now.

I compiled pytorch with latest update, and pytorch could run mnist with rocm-3.7.0.

But I cannot compiled tensorflow-upstream successfully, now

Actually, I heared somebody said there is amop or aomp directory here. I rechecked the directory, there is no directory either. You can create a symblic link for the hip_hcc.so just like rocm-3.5.1.

lrwxrwxrwx 1 root root 35 8月  10 12:17 libhip_hcc.so -> ../hip/lib/libamdhip64.so.3.5.30501

sudo ln -s /opt/rocm/hip/lib/amdhip64.so /opt/rocm/lib/libhip_hcc.so Then you can enter the next crash step. T_T

Thanks, I guess I will just let the right people fix their bugs first. Together with attempts to fix the AMD Reset Bug with my Radeon VII, I have wasted a really long time on AMD’s stuff. I hate to say but my next GPU is more like a Nvidia.

Install rocsparse and rccl. sudo apt install rocsparse rccl and update ldconfig sudo ldconfig Then tf will report that it cannot be find libhip_hcc.so. export LD_LIBRARY_PATH=/opt/rocm/amop/lib Then tf will crash with a ‘stack smashing detected’ error, I think we should wait for a new version of tensorflow-rocm for supporting rocm-3.7.0.

By the way, pytorch crashed with same error. I cannot understand why rocm-3.7.0 cannot be compatible with tf and pytorch.

Changes pushed and rccl will be part of rocm-libs from ROCm 4.1 onwards. Thank you.