tensorflow: tf.gather produces zeros for invalid indices on GPU

Environment info

Operating System: Ubuntu 16.04 LTS (64 bit)

$ dpkg -l | grep cuda | grep ^ii
ii  libcuda1-361                                361.42-0ubuntu2                                             amd64        NVIDIA CUDA runtime library
ii  libcudart7.5:amd64                          7.5.18-0ubuntu1                                             amd64        NVIDIA CUDA Runtime Library
ii  libcudnn5                                   5.0.5-1+cuda7.5                                             amd64        cuDNN runtime libraries
ii  libcudnn5-dev                               5.0.5-1+cuda7.5                                             amd64        cuDNN development libraries and headers
ii  libcudnn5-doc                               5.0.5-1+cuda7.5                                             amd64        cuDNN documents and samples
ii  nvidia-cuda-dev                             7.5.18-0ubuntu1                                             amd64        NVIDIA CUDA development files
ii  nvidia-cuda-doc                             7.5.18-0ubuntu1                                             all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                             7.5.18-0ubuntu1                                             amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                         7.5.18-0ubuntu1                                             amd64        NVIDIA CUDA development toolkit

$ find /usr/lib -name libcud\*
/usr/lib/i386-linux-gnu/libcuda.so.1
/usr/lib/i386-linux-gnu/libcuda.so.361.42
/usr/lib/i386-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcudnn_static.a
/usr/lib/x86_64-linux-gnu/libcudnn_static_v5.a
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcudart.so.7.5.18
/usr/lib/x86_64-linux-gnu/libcudnn.so
/usr/lib/x86_64-linux-gnu/libcuda.so.361.42
/usr/lib/x86_64-linux-gnu/libcudnn.so.5.0.5
/usr/lib/x86_64-linux-gnu/libcudart.so
/usr/lib/x86_64-linux-gnu/libcudart.so.7.5
/usr/lib/x86_64-linux-gnu/libcudadevrt.a
/usr/lib/x86_64-linux-gnu/libcudnn.so.5
/usr/lib/x86_64-linux-gnu/stubs/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcudart_static.a

$ python -c "import tensorflow; print(tensorflow.__version__)"
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
0.10.0rc0

Steps to reproduce

In [23]: x = tf.constant([1.1,2.2,3.3])

In [24]: a = tf.constant(123,dtype=tf.int32)

In [25]: tf.gather(x,a,validate_indices=True).eval()
Gather_8: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Gather_8: /job:localhost/replica:0/task:0/gpu:0
Const_8: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Const_8: /job:localhost/replica:0/task:0/gpu:0
Const_7: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Const_7: /job:localhost/replica:0/task:0/gpu:0
Out[25]: 0.0

In [26]:

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 35 (28 by maintainers)

Most upvoted comments

My idea was that there are two ops. The gather op is changed to be forgiving, i.e. for both GPU and CPU it returns zero whenever an index is out-of-bounds. A separate op (say “check_indices”) with only a CPU implementation determines whether the gather is “valid” (i.e. no out-of-bounds lookups, in the same way that the current CPU implementation does, iiuc).

For some applications, forgiving gather is exactly what you want. It is simple to turn this into something which uses a user-specified default value instead of 0, using something like the code snippet above. Afaict, forgiving gather is currently quite hard to effect in tensorflow.

tf.gather can be implemented in terms of forgiving gather and check_indices. This preserves the current CPU behavior, and extends that to the GPU case, fixing this bug.

MattShannon on Feb 5, 2018

Cool, thanks for the pointer @josh11b . I am hoping to submit a pull request for the doc fix in a day or so, looking to be consistent with docs elsewhere Re: GPU vs CPU differences/limitations.

Fenugreek on Mar 22, 2017

I’d love to see GPU doing the same thing as the CPU code is doing, but as I said, the flag is unnecessary since it does not do anything on neither GPU nor CPU (and this behavior is undocumented as well).

If the exception thingy from GPU is too hard to do right now just put a NaN here instead of T(0)? https://github.com/tensorflow/tensorflow/blob/c5f94b10bbb30e525fa3ca313e7ccb173040c90a/tensorflow/core/kernels/gather_op_gpu.cu.cc#L41

Silent and undetectable failure is dangerous…

hholst80 on Aug 8, 2016