tensorflow: TF1.6 MKL bug in concatenation op
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes.
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux CentOS 7
- TensorFlow installed from (source or binary): Binary (Intel MKL wheel) https://software.intel.com/en-us/articles/intel-optimized-tensorflow-wheel-now-available
- TensorFlow version (use command below): b’unknown’ 1.6.0
- Python version: 3.6
- Bazel version (if compiling from source): N/A
- GCC/Compiler version (if compiling from source): N/A
- CUDA/cuDNN version: N/A
- GPU model and memory: N/A
- Exact command to reproduce: U-Net model graph compiles but error during training from concatenation op. The code is here: https://github.com/mas-dse-greina/unet/blob/master/train.py
You can collect some of this information using our environment capture script: == cat /etc/issue =============================================== Linux lancelot24 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux VERSION=“7 (Core)” VERSION_ID=“7” CENTOS_MANTISBT_PROJECT_VERSION=“7” REDHAT_SUPPORT_PRODUCT_VERSION=“7”
== are we in docker ============================================= No
== compiler ===================================================== c++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16) Copyright © 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
== uname -a ===================================================== Linux lancelot24 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
== check pips =================================================== numpy (1.14.1) protobuf (3.5.1) tensorflow (1.6.0)
== check for virtualenv ========================================= False
== tensorflow import ============================================
tf.VERSION = 1.6.0
tf.GIT_VERSION = v1.6.0-0-gd2e24b6039
tf.COMPILER_VERSION = v1.6.0-0-gd2e24b6039
Sanity check: array([1], dtype=int32)
/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
== env ========================================================== LD_LIBRARY_PATH is unset DYLD_LIBRARY_PATH is unset
== nvidia-smi =================================================== ./p.sh: line 105: nvidia-smi: command not found
== cuda libs ===================================================
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with
python -c “import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)”
Describe the problem
Describe the problem clearly here. Be sure to convey here why it’s a bug in TensorFlow or a feature request.
This is a bug in the TensorFlow 1.6. I am using the Intel-supplied pip wheel for MKL optimization. I ran the same code with the TF1.4 MKL wheel and my code worked just fine. The error messages indicate that there is a problem with the concatenation operation. I am not seeing this error with the generic TensorFlow 1.6 (i.e. pip install tensorflow). I believe it is limited to the MKL op for concatenation.
Source code / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
WARNING:tensorflow:Variable *= will be deprecated. Use variable.assign_mul if you want assignment to the variable value or ‘x = x * y’ if you want a new python Tensor object. Train on 24800 samples, validate on 6200 samples Epoch 1/10 Traceback (most recent call last): File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1361, in _do_call return fn(*args) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1340, in _run_fn target_list, status, run_metadata) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py”, line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.AbortedError: Operation received an exception:Status: 3, message: could not create a concat primitive descriptor, in file tensorflow/core/kernels/mkl_concat_op.cc:777 [[Node: concatenate/concat = _MklConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _kernel=“MklOp”, _device=“/job:localhost/replica:0/task:0/device:CPU:0”](transConv6/BiasAdd, conv4b/Relu, concatenate/concat/axis, DMT/_29, conv4b/Relu:1, DMT/_30)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File “train.py”, line 571, in <module> settings.OUT_CHANNEL_NO, settings.MODEL_FN, settings.MODE, args) File “train.py”, line 510, in train_and_predict callbacks=[model_checkpoint, tensorboard_checkpoint]) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/training.py”, line 1793, in fit validation_steps=validation_steps) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/training.py”, line 1283, in _fit_loop outs = f(ins_batch) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py”, line 2663, in call fetches=fetches, feed_dict=feed_dict, **self.session_kwargs) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 905, in run run_metadata_ptr) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1137, in _run feed_dict_tensor, options, run_metadata) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1355, in _do_run options, run_metadata) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1374, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.AbortedError: Operation received an exception:Status: 3, message: could not create a concat primitive descriptor, in file tensorflow/core/kernels/mkl_concat_op.cc:777 [[Node: concatenate/concat = _MklConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _kernel=“MklOp”, _device=“/job:localhost/replica:0/task:0/device:CPU:0”](transConv6/BiasAdd, conv4b/Relu, concatenate/concat/axis, DMT/_29, conv4b/Relu:1, DMT/_30)]]
Caused by op ‘concatenate/concat’, defined at: File “train.py”, line 571, in <module> settings.OUT_CHANNEL_NO, settings.MODEL_FN, settings.MODE, args) File “train.py”, line 443, in train_and_predict model = model5_MultiLayer(args, False, False, img_rows, img_cols, input_no, output_no, print_summary=True) File “train.py”, line 183, in model5_MultiLayer kernel_size=(3, 3), strides=(2, 2), padding=‘same’)(conv5), conv4], axis=concat_axis) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/layers/merge.py”, line 665, in concatenate return Concatenate(axis=axis, **kwargs)(inputs) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/topology.py”, line 258, in call output = super(Layer, self).call(inputs, **kwargs) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/layers/base.py”, line 696, in call outputs = self.call(inputs, *args, **kwargs) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/layers/merge.py”, line 174, in call return self._merge_function(inputs) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/layers/merge.py”, line 380, in _merge_function return K.concatenate(inputs, axis=self.axis) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py”, line 2083, in concatenate return array_ops.concat([to_dense(x) for x in tensors], axis) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py”, line 1175, in concat return gen_array_ops._concat_v2(values=values, axis=axis, name=name) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 625, in _concat_v2 “ConcatV2”, values=values, axis=axis, name=name) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper op_def=op_def) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 3271, in create_op op_def=op_def) File “/home/bduser/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py”, line 1650, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
AbortedError (see above for traceback): Operation received an exception:Status: 3, message: could not create a concat primitive descriptor, in file tensorflow/core/kernels/mkl_concat_op.cc:777 [[Node: concatenate/concat = _MklConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _kernel=“MklOp”, _device=“/job:localhost/replica:0/task:0/device:CPU:0”](transConv6/BiasAdd, conv4b/Relu, concatenate/concat/axis, DMT/_29, conv4b/Relu:1, DMT/_30)]]
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 8
- Comments: 61 (19 by maintainers)
@eLvErDe I did “conda update conda” and “conda update mkl”. But I’m still getting the error. I’m using python 3.6, tensorflow 1.9 Please help
This bug, or at least a very related bug, is still open in Tensorflow 1.14. The concatenate operation leads to the error “operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:433” for the mkl package (tensorflow=1.14.0=mkl_py37h45c423b_0 from anaconda). After installing the eigen package from anaconda (tensorflow=1.14.0=eigen_py37h195cb1b_0), the error does not appear. I have attached an example program where the error appears and the printed error message bug_concatenate_mkl.zip .
Hey everyone, I’m facing the same issue in Tensorflow 1.14 with Python 3.6.9. I tried
conda update -n base -c defaults conda
andconda update mkl
, as well asconda update eigen
. I took your suggestion @ae137 and installed the following eigen packages: win-64/tensorflow-1.14.0-eigen_py37hcf3f253_0.tar.bz2 and win-64/tensorflow-1.14.0-eigen_py36hf4fd08c_0.tar.bz2. I placed in them in my Anaconda3/pkgs folder and restarted Anaconda but I’m still facing the same error.AbortedError: Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:433 [[{{node Adam/gradients_1/concatenate_2/concat_grad/Slice_2}}]] [Op:__inference_keras_scratch_graph_9124]
Did I mess up/miss a step? Any help is appreciated, thanks.
All, I am still facing the same issue when I use the wheel from TF source code compile. I did ‘git checkout r1.8’ before the compile. Please help me. Thank you.