tensorflow: "Unimplemented: Deterministic GPU implementation of unsorted segment reduction op not available" with AUC metric and TF_DETERMINISTIC_OPS
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): OpenSUSE LEAP 15.2
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): v2.6.0-rc2-32-g919f693420e 2.6.0
- Python version: Python 3.9.6
- CUDA/cuDNN version: 11.2 and 8.1.1, I believe
- GPU model and memory: Quadro RTX 6000
Reproduces on Colab with GPU.
Describe the current behavior
Traceback (most recent call last):
[...]
File "/home/bers/proj/bug.py", line 12, in <module>
model.fit(x=data, y=data)
File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/keras/engine/training.py", line 1184, in fit
tmp_logs = self.train_function(iterator)
File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call
return self._stateless_fn(*args, **kwds)
File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3039, in __call__
return graph_function._call_flat(
File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1963, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 591, in call
outputs = execute.execute(
File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: 2 root error(s) found.
(0) Unimplemented: Deterministic GPU implementation of unsorted segment reduction op not available.
[[node UnsortedSegmentSum (defined at home/bers/proj/bug.py:12) ]]
[[assert_less_equal/Assert/AssertGuard/pivot_f/_13/_39]]
(1) Unimplemented: Deterministic GPU implementation of unsorted segment reduction op not available.
[[node UnsortedSegmentSum (defined at home/bers/proj/bug.py:12) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_513]
Function call stack:
train_function -> train_function
Describe the expected behavior
No error (works in TF 2.5.0)
Standalone code to reproduce the issue
import os
os.environ["TF_DETERMINISTIC_OPS"] = "True"
import tensorflow as tf
data = tf.ones((1, 1))
layer = tf.keras.layers.Input(shape=[1])
model = tf.keras.models.Model(inputs=layer, outputs=layer)
model.compile(loss="categorical_crossentropy", metrics="AUC")
model.fit(x=data, y=data)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17 (9 by maintainers)
This is fixed with https://github.com/tensorflow/tensorflow/pull/51861, and the fix will be in TF 2.7.
I’m unsure if the AUC metric was nondeterministic in TF 2.5. It used
tf.math.unsorted_segment_sum
, which was nondeterminsitic in certain cases, but it’s possible AUC did not use it in a nondeterministic way. The exception forunsorted_segment_sum
was added in TF 2.6, butunsorted_segment_sum
was nondeterministic before that in certain cases. In any case, this is now fixed, so it’s not worth looking into.Ok! @bersbersbers , Could you see the comment at this issue1 ,issue2 and try again after editing the code like below.
ok @bersbersbers , Is the issue still replicating ? Feel to free to close this issue if it helped.
This works for me, too:
However, I wonder: setting
TF_DISABLE_SEGMENT_REDUCTION_OP_DETERMINISM_EXCEPTIONS
was not necessary intensorflow==2.5.1
. So what has changed?tensorflow==2.5.1
, and a missing exception was added intensorflow==2.6.0
to make users aware of that fact?tensorflow==2.5.1
, and this is a regression intensorflow==2.6.0
?Please try again with python 3.8/python 3.7 . Hey @sanatmpa1 ,Could you please look at this issue!
My issue is solved with TF_DISABLE_SEGMENT_REDUCTION_OP_DETERMINISM_EXCEPTIONS = 1