tensorflow: TF 1.14.0 training crashes with unimplemented Conv2D errors (works fine in TF 1.13.2)

Environment

Ubuntu 16.04:
Docker based tensorflow/tensorflow:1.14.0-gpu
tensor2tensor==1.14.0 (pip installed in container)
Python 2.7
CUDA/cuDNN version: 10/7 (defaults from docker image)
GPUs (tested on many from 1080 to RTX Titan)

Issue Change in Tensorflow has broken tensor2tensor librispeech training.

Running librispeech training crashes with Unimplemented Conv2D errors.

  (0) Unimplemented:  The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW
         [[{{node Conv2D}}]]
  (1) Unimplemented:  The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW
         [[{{node Conv2D}}]]
         [[Shape_3/_8]]

Expected behavior This works fine in earlier versions of Tensorflow (e.g. 1.13.2).

Code to reproduce the issue Via Nvidia Docker Hub run tensorflow/tensorflow:1.14.0-gpu pip install tensorflow-hub && pip install tensor2tensor apt-get update && apt-get install sox t2t-trainer --problem=librispeech_clean_small --model=transformer --output_dir=/models/JUNK --data_dir=/data/ --save_checkpoints_secs=1800 --schedule=train --hparams_set=transformer_librispeech (note: sox and --generate are only needed once, to prep the dataset)

Other info / logs Related to closed issue #32017.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 6
Comments: 16 (6 by maintainers)

Most upvoted comments

Still fails in TF1.15-rc3 when no explicit device placement is given.

mschonwe on Oct 9, 2019

I found an issue that seems related: tensorflow/tensorflow/issues/26411 I changed add_delta_deltas to hard code placement on CPU:

  with tf.device('/cpu:0'):
    filterbanks = tf.nn.conv2d(
      filterbanks, delta_filter_stack, [1, 1, 1, 1], "SAME", data_format="NHWC",
      name=name)

Training runs without error. Seems to me it would be preferable to place the conv2d op on GPU (except for this issue).

mschonwe on Sep 21, 2019

@Leslie-Fang the device placement should be putting these ops on GPU (afaik). The issue only crops up in new versions of TF code, in older versions the GPU utilization is appropriately high.

mschonwe on Sep 20, 2019

Faced the same issue. The program gets stuck when using GPU but works on CPU. No error message for me, it just remains stuck due to the tf.nn.conv2d code.

Explicit device placement by @mschonwe helped.

/issues/26411 relates to a different problem, but seems related.

This seems to be it; tf.nn.conv2d or tf.nn.conv1d inside tf.dataset map doesn’t work with GPU enabled for me.

rusiaaman on Jan 3, 2020

This is based on the tensorflow/tensor2tensor project, In the initial post I describe how to reproduce. The function that causes the trouble is the conv2d in tensor2tensor/layers/common_audio.py add_delta_deltas().

The (likely) issue is the optimization pass causing a conv2d op, which should be placed on CPU, to be rewritten to use a version of tf.nn.conv2d() that is only available on GPU.

Since I have a work-around (above) for the bug, I am ok closing this issue.

mschonwe on Sep 24, 2019