tensorflow: Max pooling cause error on empty batch

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 3.10.0-693.2.2.el7.x86_64
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 1.8.0
Python version: Python 2.7.14 :: Anaconda
Bazel version (if compiling from source): None
GCC/Compiler version (if compiling from source): None
CUDA/cuDNN version: cuda==9.0, cudnn==7.0.4
GPU model and memory: None
Exact command to reproduce: See below

Describe the problem

When batch_size is 0, max pooling operation seems to produce an unhandled cudaError_t status. It may cause subsequent operations fail with odd error message. That is extremely difficult to debug.

(This corner case bothers us, where we first extract some bounding boxes and then run traditional convolution operations on areas specified by them. The above error occurs in case that no bounding boxes are detected thus batch_size becomes 0. However, the python exception will be randomly thrown at following operation or following session run steps)

import tensorflow as tf
import numpy as np

x = tf.placeholder(dtype=tf.float32, shape=[None, 4, 4, 1])
pool_op = tf.nn.pool(x, pooling_type="MAX", window_shape=[2, 2], strides=[1, 1], padding="SAME")

y = tf.placeholder(dtype=tf.float32, shape=[None])
other_op = tf.where(tf.equal(y, 1.0))

normal_data = np.zeros([1, 4, 4, 1], dtype="float32")
empty_data = np.zeros([0, 4, 4, 1], dtype="float32")

# cudaError is thread local, limit thread pool size to make it easy to reproduce
config = tf.ConfigProto()
config.inter_op_parallelism_threads = 1
with tf.Session(config=config) as sess:
    # run other_op success
    print sess.run(other_op, {y: [1.0, 2.0, 3.0, 4.0]})  # [[0]]

    # run pooling on datas success
    print sess.run(pool_op, {x: normal_data}).shape  # (1, 4, 4, 1)
    print sess.run(pool_op, {x: empty_data}).shape  # (0, 4, 4, 1)

    # run other_op now failed
    print sess.run(other_op, {y: [1.0, 2.0, 3.0, 4.0]})  # err

Above code report error: tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true / nonzero indices. temp_storage_bytes: 1, status: invalid configuration argument

“invalid configuration argument” seems to be message return by cudaGetError, which indicates a failed kernel launch due to zero or too large number of block threads.

Source code / logs

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 16 (14 by maintainers)

Commits related to this issue

Support empty inputs in some maxpool kernels. (#21338) — committed to ppwwyyxx/tensorflow by ppwwyyxx 6 years ago

Most upvoted comments

@angersson No I’m not. I can fix the pooling ops but I expect TF team to find out why the tests did not detect such failure.

ppwwyyxx on Aug 8, 2018

Adding an if-else somewhere in the op to check empty inputs can solve this issue. But before that I think it’s worth fixing the unit test framework first:

There is actually a test that should’ve triggered this error: https://github.com/tensorflow/tensorflow/blob/1a13c4f2a0b4491ae3003ff0a400d5d8cb521c4a/tensorflow/python/kernel_tests/pooling_ops_test.py#L570-L578. Maybe the unit tests should check cuda error after each session run.

ppwwyyxx on Aug 2, 2018