tensorflow: tf.gather produces zeros for invalid indices on GPU
Environment info
Operating System: Ubuntu 16.04 LTS (64 bit)
$ dpkg -l | grep cuda | grep ^ii
ii libcuda1-361 361.42-0ubuntu2 amd64 NVIDIA CUDA runtime library
ii libcudart7.5:amd64 7.5.18-0ubuntu1 amd64 NVIDIA CUDA Runtime Library
ii libcudnn5 5.0.5-1+cuda7.5 amd64 cuDNN runtime libraries
ii libcudnn5-dev 5.0.5-1+cuda7.5 amd64 cuDNN development libraries and headers
ii libcudnn5-doc 5.0.5-1+cuda7.5 amd64 cuDNN documents and samples
ii nvidia-cuda-dev 7.5.18-0ubuntu1 amd64 NVIDIA CUDA development files
ii nvidia-cuda-doc 7.5.18-0ubuntu1 all NVIDIA CUDA and OpenCL documentation
ii nvidia-cuda-gdb 7.5.18-0ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 7.5.18-0ubuntu1 amd64 NVIDIA CUDA development toolkit
$ find /usr/lib -name libcud\*
/usr/lib/i386-linux-gnu/libcuda.so.1
/usr/lib/i386-linux-gnu/libcuda.so.361.42
/usr/lib/i386-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcudnn_static.a
/usr/lib/x86_64-linux-gnu/libcudnn_static_v5.a
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcudart.so.7.5.18
/usr/lib/x86_64-linux-gnu/libcudnn.so
/usr/lib/x86_64-linux-gnu/libcuda.so.361.42
/usr/lib/x86_64-linux-gnu/libcudnn.so.5.0.5
/usr/lib/x86_64-linux-gnu/libcudart.so
/usr/lib/x86_64-linux-gnu/libcudart.so.7.5
/usr/lib/x86_64-linux-gnu/libcudadevrt.a
/usr/lib/x86_64-linux-gnu/libcudnn.so.5
/usr/lib/x86_64-linux-gnu/stubs/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcudart_static.a
$ python -c "import tensorflow; print(tensorflow.__version__)"
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
0.10.0rc0
Steps to reproduce
In [23]: x = tf.constant([1.1,2.2,3.3])
In [24]: a = tf.constant(123,dtype=tf.int32)
In [25]: tf.gather(x,a,validate_indices=True).eval()
Gather_8: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Gather_8: /job:localhost/replica:0/task:0/gpu:0
Const_8: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Const_8: /job:localhost/replica:0/task:0/gpu:0
Const_7: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Const_7: /job:localhost/replica:0/task:0/gpu:0
Out[25]: 0.0
In [26]:
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 35 (28 by maintainers)
My idea was that there are two ops. The gather op is changed to be forgiving, i.e. for both GPU and CPU it returns zero whenever an index is out-of-bounds. A separate op (say “check_indices”) with only a CPU implementation determines whether the gather is “valid” (i.e. no out-of-bounds lookups, in the same way that the current CPU implementation does, iiuc).
For some applications, forgiving gather is exactly what you want. It is simple to turn this into something which uses a user-specified default value instead of 0, using something like the code snippet above. Afaict, forgiving gather is currently quite hard to effect in tensorflow.
tf.gather can be implemented in terms of forgiving gather and check_indices. This preserves the current CPU behavior, and extends that to the GPU case, fixing this bug.
Cool, thanks for the pointer @josh11b . I am hoping to submit a pull request for the doc fix in a day or so, looking to be consistent with docs elsewhere Re: GPU vs CPU differences/limitations.
I’d love to see GPU doing the same thing as the CPU code is doing, but as I said, the flag is unnecessary since it does not do anything on neither GPU nor CPU (and this behavior is undocumented as well).
If the exception thingy from GPU is too hard to do right now just put a NaN here instead of T(0)? https://github.com/tensorflow/tensorflow/blob/c5f94b10bbb30e525fa3ca313e7ccb173040c90a/tensorflow/core/kernels/gather_op_gpu.cu.cc#L41
Silent and undetectable failure is dangerous…