tensorflow: Could not initialize a memory descriptor when using softmax layer

I have both CPU and GPU version installed by Miniconda, each with a unique environment. While GPU version works fine, the CPU version seems to throw an error when I try to add a softmax layer after a convolution layer.

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Manjaro 4.14.74
  • TensorFlow installed from (source or binary): binary from Miniconda
  • TensorFlow version (use command below): 1.11.0
  • Python version: Python 3.6.6 :: Anaconda, Inc.
  • CUDA/cuDNN version: CPU version, no CUDA/cuDNN
  • Bazel version: N/A
  • GPU model and memory: N/A
  • Mobile device: N/A
  • Exact command to reproduce: python code.py

Describe the current behavior

Run the test code, the program throws AbortedError, info is:

AbortedError (see above for traceback): Operation received an exception:Status: 3, message: could not initialize a memory descriptor, in file tensorflow/core/kernels/mkl_softmax_op.cc:163
	 [[{{node Softmax}} = _MklSoftmax[T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](conv2d/BiasAdd, conv2d/BiasAdd:2)]]

Describe the expected behavior

The program should finish with no error.

Code to reproduce the issue

import tensorflow as tf
import numpy as np

sess = tf.Session()
inputs = tf.placeholder(dtype=tf.float32, shape=(1, 300, 300, 3))
net = tf.layers.Conv2D(filters=2, kernel_size=3)(inputs)
net = tf.nn.softmax(net, axis=-1)
sess.run(tf.global_variables_initializer())
sess.run(net, feed_dict={inputs: np.zeros(shape=(1, 300, 300, 3), dtype=np.float32)})

Other info / logs

  • I set up the environment by conda create -n xxx pip python=3 tensorflow

  • Traceback is:

Traceback (most recent call last):
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.AbortedError: Operation received an exception:Status: 3, message: could not initialize a memory descriptor, in file tensorflow/core/kernels/mkl_softmax_op.cc:163
         [[{{node Softmax}} = _MklSoftmax[T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](conv2d/BiasAdd, conv2d/BiasAdd:2)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    sess.run(net, feed_dict={inputs: np.zeros(shape=(1, 300, 300, 3), dtype=np.float32)})
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 887, in run
    run_metadata_ptr)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1110, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1286, in _do_run
    run_metadata)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1308, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.AbortedError: Operation received an exception:Status: 3, message: could not initialize a memory descriptor, in file tensorflow/core/kernels/mkl_softmax_op.cc:163
         [[{{node Softmax}} = _MklSoftmax[T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](conv2d/BiasAdd, conv2d/BiasAdd:2)]]

Caused by op 'Softmax', defined at:
  File "test.py", line 7, in <module>
    net = tf.nn.softmax(net, axis=-1)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1746, in softmax
    return _softmax(logits, gen_nn_ops.softmax, axis, name)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1685, in _softmax
    return compute_op(logits, name=name)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 7138, in softmax
    "Softmax", logits=logits, name=name)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
    op_def=op_def)
  File "/home/kwy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in __init__
    self._traceback = tf_stack.extract_stack()

AbortedError (see above for traceback): Operation received an exception:Status: 3, message: could not initialize a memory descriptor, in file tensorflow/core/kernels/mkl_softmax_op.cc:163
         [[{{node Softmax}} = _MklSoftmax[T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](conv2d/BiasAdd, conv2d/BiasAdd:2)]]
  • GPU version works fine.

  • If i set axis to 0, 1 or 2, the program finishes with no error, but with it set to -1 or 3, the error occurs.

  • If the softmax layer is added after a dense layer, it also works fine.

  • I’ve also tested on another server with CentOS 7 and a Quadro P2000, the problem still occurs. (GPU version works fine while CPU version not)

  • This code still not work:

net = tf.layers.Conv2D(filters=2, kernel_size=3, activation=tf.nn.softmax)(inputs)

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 19 (7 by maintainers)

Most upvoted comments

@eLvErDe sorry for the issues. Indeed 1.12 the issue is there. The good news is I have tested, e.g. this commit id: 07d5d08 (master branch, tagged with v1.13.0-rc2 v1.13.0-rc1 v1.13.0-rc0), that the issue is gone. Can you please try one last time with this commit id with MKL?

Below is what I got:

2019-02-25 00:20:09.309341: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F 2019-02-25 00:20:09.335924: I tensorflow/core/common_runtime/process_util.cc:71] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance. WARNING:tensorflow:From TF_Public_07d5d08579bbbff910653a59163b4f8f180d16ac/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer.

i.e. No errors were thrown.

Sorry for my late reply. I’ve tried the latest version and the problem is fixed. Thanks.

@eLvErDe sorry for the issues. Indeed 1.12 the issue is there. The good news is I have tested, e.g. this commit id: 07d5d08579bbbff910653a59163b4f8f180d16ac (master branch, tagged with v1.13.0-rc2 v1.13.0-rc1 v1.13.0-rc0), that the issue is gone. Can you please try one last time with this commit id with MKL?

Below is what I got:

2019-02-25 00:20:09.309341: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F 2019-02-25 00:20:09.335924: I tensorflow/core/common_runtime/process_util.cc:71] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance. WARNING:tensorflow:From TF_Public_07d5d08579bbbff910653a59163b4f8f180d16ac/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer.

i.e. No errors were thrown.