tensorflow: Max pooling cause error on empty batch
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 3.10.0-693.2.2.el7.x86_64
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): 1.8.0
- Python version: Python 2.7.14 :: Anaconda
- Bazel version (if compiling from source): None
- GCC/Compiler version (if compiling from source): None
- CUDA/cuDNN version: cuda==9.0, cudnn==7.0.4
- GPU model and memory: None
- Exact command to reproduce: See below
Describe the problem
When batch_size is 0, max pooling operation seems to produce an unhandled cudaError_t status. It may cause subsequent operations fail with odd error message. That is extremely difficult to debug.
(This corner case bothers us, where we first extract some bounding boxes and then run traditional convolution operations on areas specified by them. The above error occurs in case that no bounding boxes are detected thus batch_size becomes 0. However, the python exception will be randomly thrown at following operation or following session run steps)
import tensorflow as tf
import numpy as np
x = tf.placeholder(dtype=tf.float32, shape=[None, 4, 4, 1])
pool_op = tf.nn.pool(x, pooling_type="MAX", window_shape=[2, 2], strides=[1, 1], padding="SAME")
y = tf.placeholder(dtype=tf.float32, shape=[None])
other_op = tf.where(tf.equal(y, 1.0))
normal_data = np.zeros([1, 4, 4, 1], dtype="float32")
empty_data = np.zeros([0, 4, 4, 1], dtype="float32")
# cudaError is thread local, limit thread pool size to make it easy to reproduce
config = tf.ConfigProto()
config.inter_op_parallelism_threads = 1
with tf.Session(config=config) as sess:
# run other_op success
print sess.run(other_op, {y: [1.0, 2.0, 3.0, 4.0]}) # [[0]]
# run pooling on datas success
print sess.run(pool_op, {x: normal_data}).shape # (1, 4, 4, 1)
print sess.run(pool_op, {x: empty_data}).shape # (0, 4, 4, 1)
# run other_op now failed
print sess.run(other_op, {y: [1.0, 2.0, 3.0, 4.0]}) # err
Above code report error: tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true / nonzero indices. temp_storage_bytes: 1, status: invalid configuration argument
“invalid configuration argument” seems to be message return by cudaGetError, which indicates a failed kernel launch due to zero or too large number of block threads.
Source code / logs

About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 16 (14 by maintainers)
Commits related to this issue
- Support empty inputs in some maxpool kernels. (#21338) — committed to ppwwyyxx/tensorflow by ppwwyyxx 6 years ago
@angersson No I’m not. I can fix the pooling ops but I expect TF team to find out why the tests did not detect such failure.
Adding an if-else somewhere in the op to check empty inputs can solve this issue. But before that I think it’s worth fixing the unit test framework first:
There is actually a test that should’ve triggered this error: https://github.com/tensorflow/tensorflow/blob/1a13c4f2a0b4491ae3003ff0a400d5d8cb521c4a/tensorflow/python/kernel_tests/pooling_ops_test.py#L570-L578. Maybe the unit tests should check cuda error after each session run.