tensorflow: TF that is build from r1.9 - crashes with _gru_ops.so: undefined symbol: _ZN15stream_executor6Stream12ThenBlasGemmENS_4blas9TransposeES2_yyyfRKNS_12DeviceMemoryIfEEiS6_ifPS4_i

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No, I have used unchanged code from branch r1.9, commit e1436b2952c7600c8ac88114210381db0398be16
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below):branch r1.9, commit e1436b2952c7600c8ac88114210381db0398be16
  • Python version: 3.6.5
  • Bazel version (if compiling from source): 0.10.1
  • GCC/Compiler version (if compiling from source): 4.8
  • CUDA/cuDNN version: 9.2/7.2
  • GPU model and memory: V100, 16Gb
  • Exact command to reproduce:
git clone https://github.com/tensorflow/benchmarks.git
cd benchmarks/scripts/tf_cnn_benchmarks
git checkout 551caecb936312690d6bc8c8c2e2562089e2c200
python3 tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256 --num_batches=100 --model=resnet50 --optimizer=momentum --variable_update=replicated --nodistortions --hierarchical_copy=True --gradient_repacking=8 --datasets_use_prefetch=False --display_every=10 --gpu_thread_mode=gpu_shared --num_gpus=8 --use_fp16=True

Results:

python3 tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256 --num_batches=100 --model=resnet50 --optimizer=momentum --variable_update=replicated --nodistortions --hierarchical_copy=True --gradient_repacking=8 --datasets_use_prefetch=False --display_every=10 --gpu_thread_mode=gpu_shared --num_gpus=8 --use_fp16=True
/usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "tf_cnn_benchmarks.py", line 27, in <module>
    import benchmark_cnn
  File "/home/vkovalevskyi/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 48, in <module>
    import data_utils
  File "/home/vkovalevskyi/benchmarks/scripts/tf_cnn_benchmarks/data_utils.py", line 21, in <module>
    from tensorflow.contrib.data.python.ops import batching
  File "/home/vkovalevskyi/.local/lib/python3.6/site-packages/tensorflow/contrib/__init__.py", line 35, in <module>
    from tensorflow.contrib import cudnn_rnn
  File "/home/vkovalevskyi/.local/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/__init__.py", line 34, in <module>
    from tensorflow.contrib.cudnn_rnn.python.layers import *
  File "/home/vkovalevskyi/.local/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/__init__.py", line 23, in <module>
    from tensorflow.contrib.cudnn_rnn.python.layers.cudnn_rnn import *
  File "/home/vkovalevskyi/.local/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 20, in <module>
    from tensorflow.contrib.cudnn_rnn.python.ops import cudnn_rnn_ops
  File "/home/vkovalevskyi/.local/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 22, in <module>
    from tensorflow.contrib.rnn.python.ops import lstm_ops
  File "/home/vkovalevskyi/.local/lib/python3.6/site-packages/tensorflow/contrib/rnn/__init__.py", line 88, in <module>
    from tensorflow.contrib.rnn.python.ops.gru_ops import *
  File "/home/vkovalevskyi/.local/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/gru_ops.py", line 33, in <module>
    resource_loader.get_path_to_datafile("_gru_ops.so"))
  File "/home/vkovalevskyi/.local/lib/python3.6/site-packages/tensorflow/contrib/util/loader.py", line 56, in load_op_library
    ret = load_library.load_op_library(path)
  File "/home/vkovalevskyi/.local/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/vkovalevskyi/.local/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/_gru_ops.so: undefined symbol: _ZN15stream_executor6Stream12ThenBlasGemmENS_4blas9TransposeES2_yyyfRKNS_12DeviceMemoryIfEEiS6_ifPS4_i

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 14
  • Comments: 24 (13 by maintainers)

Most upvoted comments

I also encountered this on a monolithic build of TensorFlow v1.11.0.

It’s pretty easily and consistently reproducible:

  • Build within an official tf docker image (eg: tensorflow/tensorflow:1.11.0-devel-gpu) with --monolithic
  • Install the pip package
  • Import tensorflow.contrib --> observe it fail due to missing symbols

Investigating a bit, I found that a simple fix is to add the following entry to tf_version_script.lds:

*stream_executor*;

I will close the bug, I was able to correctly compile the TF 1.9, now bazel build requires different way of triggering the build in order to find all the required CUDA libraries, on top of this monolithic build almost always shows that build is ok even though some of the libs were not found. Combinations of these 2 things makes it hard to identify the root cause and quickly come up with the solution.

Which commit worked (you said some r1.9 worked)?