tensorflow: "Unimplemented: Deterministic GPU implementation of unsorted segment reduction op not available" with AUC metric and TF_DETERMINISTIC_OPS

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): OpenSUSE LEAP 15.2
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): v2.6.0-rc2-32-g919f693420e 2.6.0
  • Python version: Python 3.9.6
  • CUDA/cuDNN version: 11.2 and 8.1.1, I believe
  • GPU model and memory: Quadro RTX 6000

Reproduces on Colab with GPU.

Describe the current behavior

Traceback (most recent call last):
[...]
  File "/home/bers/proj/bug.py", line 12, in <module>
    model.fit(x=data, y=data)
  File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/keras/engine/training.py", line 1184, in fit
    tmp_logs = self.train_function(iterator)
  File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3039, in __call__
    return graph_function._call_flat(
  File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1963, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 591, in call
    outputs = execute.execute(
  File "/data2/bers/opt/pyenv/versions/3.9.6/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: 2 root error(s) found.
  (0) Unimplemented:  Deterministic GPU implementation of unsorted segment reduction op not available.
	 [[node UnsortedSegmentSum (defined at home/bers/proj/bug.py:12) ]]
	 [[assert_less_equal/Assert/AssertGuard/pivot_f/_13/_39]]
  (1) Unimplemented:  Deterministic GPU implementation of unsorted segment reduction op not available.
	 [[node UnsortedSegmentSum (defined at home/bers/proj/bug.py:12) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_513]

Function call stack:
train_function -> train_function

Describe the expected behavior

No error (works in TF 2.5.0)

Standalone code to reproduce the issue

import os

os.environ["TF_DETERMINISTIC_OPS"] = "True"

import tensorflow as tf

data = tf.ones((1, 1))
layer = tf.keras.layers.Input(shape=[1])
model = tf.keras.models.Model(inputs=layer, outputs=layer)
model.compile(loss="categorical_crossentropy", metrics="AUC")
model.fit(x=data, y=data)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

This is fixed with https://github.com/tensorflow/tensorflow/pull/51861, and the fix will be in TF 2.7.

I’m unsure if the AUC metric was nondeterministic in TF 2.5. It used tf.math.unsorted_segment_sum, which was nondeterminsitic in certain cases, but it’s possible AUC did not use it in a nondeterministic way. The exception for unsorted_segment_sum was added in TF 2.6, but unsorted_segment_sum was nondeterministic before that in certain cases. In any case, this is now fixed, so it’s not worth looking into.

Ok! @bersbersbers , Could you see the comment at this issue1 ,issue2 and try again after editing the code like below.

import os
os.environ["TF_DETERMINISTIC_OPS"] = "True"

import tensorflow as tf
seed=1441# any random number
tf.random.set_seed(seed)
data = tf.ones((1, 1))
layer = tf.keras.layers.Input(shape=[1])
model = tf.keras.models.Model(inputs=layer, outputs=layer)
model.compile(loss="categorical_crossentropy", metrics="AUC")
model.fit(x=data, y=data)

ok @bersbersbers , Is the issue still replicating ? Feel to free to close this issue if it helped.

My issue is solved with TF_DISABLE_SEGMENT_REDUCTION_OP_DETERMINISM_EXCEPTIONS = 1

This works for me, too:

import os

os.environ["TF_DETERMINISTIC_OPS"] = "True"
os.environ["TF_DISABLE_SEGMENT_REDUCTION_OP_DETERMINISM_EXCEPTIONS"] = "True"

import tensorflow as tf

data = tf.ones((1, 1))
layer = tf.keras.layers.Input(shape=[1])
model = tf.keras.models.Model(inputs=layer, outputs=layer)
model.compile(loss="categorical_crossentropy", metrics="AUC")
model.fit(x=data, y=data)

However, I wonder: setting TF_DISABLE_SEGMENT_REDUCTION_OP_DETERMINISM_EXCEPTIONS was not necessary in tensorflow==2.5.1. So what has changed?

  • Was the AUC metric non-deterministic in tensorflow==2.5.1, and a missing exception was added in tensorflow==2.6.0 to make users aware of that fact?
  • Was the AUC metric deterministic in tensorflow==2.5.1, and this is a regression in tensorflow==2.6.0?

Please try again with python 3.8/python 3.7 . Hey @sanatmpa1 ,Could you please look at this issue!

Ok! @bersbersbers , Could you see the comment at this issue1 ,issue2 and try again after editing the code like below.

import os
os.environ["TF_DETERMINISTIC_OPS"] = "True"

import tensorflow as tf
seed==1441# any random number
tf.random.set_seed(seed)
data = tf.ones((1, 1))
layer = tf.keras.layers.Input(shape=[1])
model = tf.keras.models.Model(inputs=layer, outputs=layer)
model.compile(loss="categorical_crossentropy", metrics="AUC")
model.fit(x=data, y=data)

My issue is solved with TF_DISABLE_SEGMENT_REDUCTION_OP_DETERMINISM_EXCEPTIONS = 1