tensorflow: GPU placement of tf.nn.conv2d during tf.data.Dataset.map call causes UnimplementedError (NHWC)

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux 4.20.13-arch1-1-ARCH #1 SMP PREEMPT Wed Feb 27 19:10:28 UTC 2019 x86_64 GNU/Linux
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: None
  • TensorFlow installed from (source or binary): community python-tensorflow-opt-cuda
  • TensorFlow version (use command below): 1.13.1
  • Python version: 3.7.2
  • Bazel version (if compiling from source): None
  • GCC/Compiler version (if compiling from source): None
  • CUDA/cuDNN version: 10.1 / 7.5
  • GPU model and memory: Geforce GTX 1080 Ti 11GB

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with python -c “import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)”

Describe the current behavior Evaluation of tf.nn.conv2d in a tf.data.Dataset.map call fails during Session.run() call of tf.data.iterator.get_next() with error:

tensorflow.python.framework.errors_impl.UnimplementedError: Generic conv implementation only supports NHWC tensor format for now.

See the full error log and the attached code below for the full example.

It seems that the graph is successfully created, but the evaluation of the tf.nn.conv2d call (which is implicitly placed on the GPU) is not possible. This is somehow related to the fact that this tf.nn.conv2d call is wrapped in a tf.data.Dataset.map call. Setting data_format="NHWC" or use_cudnn_on_gpu=False procudes the same error.

Describe the expected behavior tf.nn.conv2d should be evaluated, which is done iff the convolution is explicitly placed on the CPU (e.g. with tf.device('/cpu:0'): or CUDA_VISIBLE_DEVICES="").

Code to reproduce the issue

import tensorflow as tf


class IteratorInitializerHook(tf.train.SessionRunHook):
    """Hook to initialise data iterator after Session is created."""

    def __init__(self, func=None):
        super(IteratorInitializerHook, self).__init__()
        self.iterator_initializer_func = func

    def after_create_session(self, session, coord):
        """Initialise the iterator after the session has been created."""
        self.iterator_initializer_func(session)


if __name__ == '__main__':

    def apply_kernel(tensor, kernel=tf.random_normal([3, 3])):
        t = tf.expand_dims(tensor, 0)
        t = tf.expand_dims(t, -1)
        k = tf.expand_dims(kernel, -1)
        k = tf.expand_dims(k, -1)

        # TODO the following line fails during Session.run
        #  call of tf.data.Iterator.get_next() (last line in this file)
        tf_conv = tf.nn.conv2d(t, k, [1, 1, 1, 1], "SAME")
        return tf.squeeze(tf_conv)

    def do_some_things(x, y):
        x = apply_kernel(x)
        return x, y

    n, image_shape = 100, [256, 256]

    ds = tf.data.Dataset.from_tensor_slices((
        tf.random_uniform([n] + image_shape), tf.random_uniform([n])
    ))
    ds = ds.map(do_some_things)
    iterator = tf.data.Iterator.from_structure(ds.output_types, ds.output_shapes)
    data = iterator.get_next()
    ds_init_op = iterator.make_initializer(ds)

    with tf.train.SingularMonitoredSession(
            hooks=[IteratorInitializerHook(lambda s: s.run(ds_init_op))],
            config=tf.ConfigProto(log_device_placement=True)
    ) as sess:
        _ = sess.run(data)

Other info / logs Here is the full error log, including device placements. tf-conv-NHWC-issue.log

Here is a zipped version of the python example tf-conv-NHWC-issue.py.zip

Not sure if this is important, but it seems that all tensorflow ops from the function which is called by tf.data.Dataset.map are not showing up in the device placement logs.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (14 by maintainers)

Most upvoted comments

Can you try to explicitly set device to CPU inside function body. I think that might work as well.

Can you try to explicitly set device to CPU inside function body. I think that might work as well.

That’s a quite precise solution which handles my problem right away. I have used tf.device(“/cpu:0”) in the outer portions of the body but the layout optimization error was still given. But when placed in the body itself, the problem has got away.

Thanks…

Yes, when the device is explicitly set to tf.device('/cpu:0') everything works as expected.