tensorflow: MKL no longer works with tensorflow 1.15


System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): NO

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Centos 7

  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A

  • TensorFlow installed from (source or binary): source

  • TensorFlow version (use command below): 1.15.2

  • Python version: N/A

  • Bazel version (if compiling from source): 0.24.1

  • GCC/Compiler version (if compiling from source): gcc-6 (devtoolset-6 on centos 7)

  • CUDA/cuDNN version: N/A

  • GPU model and memory: N/A

  • Exact command to reproduce:

 bazel build -c opt --copt=-msse4.2 --copt=-mavx \
       --copt=-O3 --config=mkl --linkopt -ldl \
       --copt=-march=x86-64 \
       //tensorflow/tools/pip_package:build_pip_package \
       //tensorflow/tools/lib_package:libtensorflow_jni \
       //tensorflow/tools/lib_package:libtensorflow \
       //tensorflow/tools/lib_package:libtensorflow_proto \

Describe the problem

libtensorflow_framework.so build this way does not have any symbols from MKL. When trying to import tensorflow from java scala, it fails with symbol not found for tensorflow::DisableMKL()

The number of MKL symbols found in libtensorflow_framework.so for 1.15 are also significantly lower than those found in 1.14.

Source code / logs

Code used to import tensorflow in scala

import org.tensorflow.Tensorflow

Note: We have to ensure libiomp5.so and libmklml_intel.so are available on library load path.

The simplest solution we found was to load the libraries manually in order. The code snippet can be seen here: https://gist.github.com/pavanky/ea6e71e3e7e52c013db844b715723be0

Error

libtensorflow_jni.so: undefined symbol: _ZN10tensorflow10DisableMKLEv

Looking at the number of symbols related to MKL:

 $ nm -D org/tensorflow/native/linux-x86_64/libtensorflow_framework.so.1 | grep -i mkl | wc -l
1
 $ nm -D org/tensorflow/native/linux-x86_64/libtensorflow_jni.so | grep -i mkl | wc -l
9
 $ nm -D org/tensorflow/native/linux-x86_64/libtensorflow_framework.so.1 | grep -i mkl
0000000000e127b0 T _ZN10tensorflow12IsMklEnabledEv

For reference, 1.14 had a lot more

 $ nm -D org/tensorflow/native/linux-x86_64/libtensorflow_framework.so.1 | grep -i mkl | wc -l
11388
 $ nm -D org/tensorflow/native/linux-x86_64/libtensorflow_jni.so | grep -i mkl | wc -l
8

MKL is also not available in the wheel built by the command mentioned above.

>>> import tensorflow as tf
>>> tf.python.pywrap_tensorflow.IsMklEnabled()
False

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 37 (13 by maintainers)

Most upvoted comments

Thanks the new dockerfile works!

@ashahab thanks trying!

@ashahba Thank you very much!

Hi @pavanky and @preethivenkatesh I have put some docker and scripts together to help you with building TensorFlow with MKL on CentOS 7 here: https://github.com/ashahba/centos7-tf

I tried:

docker build --build-arg TF_BRANCH=v1.15.2 --build-arg PY_VER=3.6 --build-arg CONFIG_VER=v2 -f Dockerfile . -t centos-tf-3.6-v2

and the bazel options where reported as follows:

Writing build flags: build --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 --copt=-O3 --copt=-Wformat --copt=-Wformat-security --copt=-fstack-protector --copt=-fPIC --copt=-fpic --linkopt=-znoexecstack --linkopt=-zrelro --linkopt=-znow --linkopt=-fstack-protector --copt=-mmmx --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-mssse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mpopcnt --copt=-mavx --copt=-maes --copt=-mpclmul --config=mkl --config=v2

and of course the wheels correctly report MKL support:

python3
Python 3.6.8 (default, Aug  7 2019, 17:28:10) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import tensorflow_core as tf_core
>>> tf_core.python.pywrap_tensorflow.IsMklEnabled()
True

and also:

nm -D  /tensorflow_src/bazel-out/k8-opt/bin/tensorflow/libtensorflow_framework.so.1 | grep -i mkl | wc -l
15718

I tried this for both Python 2.7 and Python 3.6 and on both v1.15.0 and v1.15.2 as described in the repo and they all seem to work fine.

Also if you have new bazel flags or targets that need to be built, you can just modify the file build_tf_whl.sh around this line: https://github.com/ashahba/centos7-tf/blob/master/build_tf_whl.sh#L46

Good luck and please let me know if that solves the issue you are seeing.

Thanks.

@ashahba Even I thought this issue could be because on 2.7, but I was able to reproduce this issue on py36 when I tired.

@ashahba python2 support has been dropped internally, if you can get reproduce with python3 that is OK as well. you can run the python3 build using make IMAGE_TYPE=cpu USE_MKL=1 PY_VER=36