tensorflow: MKL no longer works with tensorflow 1.15
System information
-
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): NO
-
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Centos 7
-
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
-
TensorFlow installed from (source or binary): source
-
TensorFlow version (use command below): 1.15.2
-
Python version: N/A
-
Bazel version (if compiling from source): 0.24.1
-
GCC/Compiler version (if compiling from source): gcc-6 (devtoolset-6 on centos 7)
-
CUDA/cuDNN version: N/A
-
GPU model and memory: N/A
-
Exact command to reproduce:
bazel build -c opt --copt=-msse4.2 --copt=-mavx \
--copt=-O3 --config=mkl --linkopt -ldl \
--copt=-march=x86-64 \
//tensorflow/tools/pip_package:build_pip_package \
//tensorflow/tools/lib_package:libtensorflow_jni \
//tensorflow/tools/lib_package:libtensorflow \
//tensorflow/tools/lib_package:libtensorflow_proto \
Describe the problem
libtensorflow_framework.so build this way does not have any symbols from MKL. When trying to import tensorflow from java scala, it fails with symbol not found for tensorflow::DisableMKL()
The number of MKL symbols found in libtensorflow_framework.so for 1.15 are also significantly lower than those found in 1.14.
Source code / logs
Code used to import tensorflow in scala
import org.tensorflow.Tensorflow
Note: We have to ensure libiomp5.so and libmklml_intel.so are available on library load path.
The simplest solution we found was to load the libraries manually in order. The code snippet can be seen here: https://gist.github.com/pavanky/ea6e71e3e7e52c013db844b715723be0
Error
libtensorflow_jni.so: undefined symbol: _ZN10tensorflow10DisableMKLEv
Looking at the number of symbols related to MKL:
$ nm -D org/tensorflow/native/linux-x86_64/libtensorflow_framework.so.1 | grep -i mkl | wc -l
1
$ nm -D org/tensorflow/native/linux-x86_64/libtensorflow_jni.so | grep -i mkl | wc -l
9
$ nm -D org/tensorflow/native/linux-x86_64/libtensorflow_framework.so.1 | grep -i mkl
0000000000e127b0 T _ZN10tensorflow12IsMklEnabledEv
For reference, 1.14 had a lot more
$ nm -D org/tensorflow/native/linux-x86_64/libtensorflow_framework.so.1 | grep -i mkl | wc -l
11388
$ nm -D org/tensorflow/native/linux-x86_64/libtensorflow_jni.so | grep -i mkl | wc -l
8
MKL is also not available in the wheel built by the command mentioned above.
>>> import tensorflow as tf
>>> tf.python.pywrap_tensorflow.IsMklEnabled()
False
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 37 (13 by maintainers)
Thanks the new dockerfile works!
@ashahab thanks trying!
@ashahba Thank you very much!
Hi @pavanky and @preethivenkatesh I have put some docker and scripts together to help you with building
TensorFlow with MKLonCentOS 7here: https://github.com/ashahba/centos7-tfI tried:
and the
bazeloptions where reported as follows:and of course the wheels correctly report
MKLsupport:and also:
I tried this for both
Python 2.7andPython 3.6and on bothv1.15.0andv1.15.2as described in the repo and they all seem to work fine.Also if you have new bazel flags or targets that need to be built, you can just modify the file
build_tf_whl.sharound this line: https://github.com/ashahba/centos7-tf/blob/master/build_tf_whl.sh#L46Good luck and please let me know if that solves the issue you are seeing.
Thanks.
@ashahba Even I thought this issue could be because on 2.7, but I was able to reproduce this issue on py36 when I tired.
@ashahba python2 support has been dropped internally, if you can get reproduce with python3 that is OK as well. you can run the python3 build using
make IMAGE_TYPE=cpu USE_MKL=1 PY_VER=36