addons: MovingAverage does not work with MirroredStrategy

System information

  • OS Platform and Distribution: Ubuntu 18.04
  • TensorFlow version and how it was installed (source or binary): 2.0 from PyPi
  • TensorFlow-Addons version and how it was installed (source or binary): 0.5.2 from PyPi
  • Python version: 3.6

Describe the bug

If I compile a Keras model with a MovingAverage optimizer and a LearningRateScheduler, I get an error “Optimizer must have a “lr” attribute.” at tensorflow_core/python/keras/callbacks.py:1342. I can fix that by the following code:

@keras_utils.register_keras_custom_object
class LRMovingAverage(tfa.optimizers.MovingAverage):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    @property
    def lr(self):
        return self._optimizer.lr

However, my model is compiled under tf.distribute.MirroredStrategy().scope() and I crash in fit():

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 681, in on_epoch
    yield epoch_logs
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 324, in fit
    total_epochs=epochs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function
    distributed_function(input_fn))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 503, in _call
    self._initialize(args, kwds, add_initializers_to=initializer_map)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 408, in _initialize
    *args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1848, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2150, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2041, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 358, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 73, in distributed_function
    per_replica_function, args=(model, x, y, sample_weights))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 760, in experimental_run_v2
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1787, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 661, in _call_for_each_replica
    fn, args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 196, in _call_for_each_replica
    coord.join(threads)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 190, in _call_for_each_replica
    **merge_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 446, in_distributed_apply
    ds_reduce_util.ReduceOp.SUM, grads_and_vars)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1481, in batch_reduce_to
    return self._batch_reduce_to(reduce_op, value_destination_pairs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 707, in _batch_reduce_to
    value_destination_pairs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/cross_device_ops.py", line 317, in batch_reduce
    value_destination_pairs[0][0].values) == 1:
IndexError: list index out of range

Code to reproduce the issue

TODO

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (9 by maintainers)

Most upvoted comments

I will ping here if I have problems because issue authors cannot reopen their issues on GitHub if they were closed by maintainers.