tensorflow: tf.reduce_max returns wrong answers on large tensors e.g., (2048,2048,1024)

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
TensorFlow installed from (source or binary): docker run nvcr.io/nvidia/tensorflow:20.12-tf2-py3
TensorFlow version (use command below): 2.3.1
Python version: 3.8
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: 11
GPU model and memory: RTX 3090, 24GB

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:

TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior Computing the max of a large 3d matrix returns the incorrect answer

Describe the expected behavior I would expect the value of the function to correct!

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

for i in range(12):
    n = 2**i
    x = np.arange(n*n*1024, dtype=np.float32).reshape((n,n,1024))
    print(x.shape, np.max(x), tf.reduce_max(x).numpy())


(1, 1, 1024) 1023.0 1023.0
(2, 2, 1024) 4095.0 4095.0
(4, 4, 1024) 16383.0 16383.0
(8, 8, 1024) 65535.0 65535.0
(16, 16, 1024) 262143.0 262143.0
(32, 32, 1024) 1048575.0 1048575.0
(64, 64, 1024) 4194303.0 4194303.0
(128, 128, 1024) 16777215.0 16777215.0
(256, 256, 1024) 67108864.0 67108864.0
(512, 512, 1024) 268435460.0 268435460.0
(1024, 1024, 1024) 1073741800.0 1073741800.0

(2048, 2048, 1024) 4294967300.0 -3.4028235e+38

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 20 (6 by maintainers)

Most upvoted comments

Pytorch instead uses its own CUDA kernel.

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/Reduce.cuh

WindQAQ on Jan 26, 2021