tensorflow: DistributionStrategy is not supported by tf.keras.models.Model.fit_generator

Hi! Recently I’ve encountered a NonImplemetedError while trying to apply a fit_generator method of a tf.keras.models.Model with a MultiWorkerDistributionStrategy. It is almost a year since this handlers were added to the code ( https://github.com/tensorflow/tensorflow/commit/9541ce3475ea70fd8eb9552f60de462127f15440#diff-de9b96ac2d81503324cbbbe21732031f ) and I’m wondering whether to expect an implementation to be added any time soon? (with the release of TF2.0 for example)

Making efforts to find a workaround I’ve tried to transform a generator to TF Dataset by tf.data.Dataset.from_generator to replace the fit_generator by fit but encountered similar problem. The obtained object has type DatasetV1Adapterwhich is also incompatible with distribution strategies

I dare to assume that for a wide society of TF users and for me in particular this functionality would be of a great interest. Dealing with large, domain specific data sets that doesn’t fit into memory, one often has no choice other than writing a custom data generator. When big data is involved the distributed training might be crucial.

I would highly appreciate any information on the current state of the problem or possible workarounds from the TensorFlow developers team. Thanks in advance!

System information

  • TensorFlow version (you are using): 2.0.0.dev20190729
  • Are you willing to contribute it (Yes/No): No

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 2
  • Comments: 23 (4 by maintainers)

Most upvoted comments

Hi, I have found that the workaround those not work if the model has multiple inputs. The following code fails:

import numpy as np
import tensorflow as tf
#strategy = tf.distribute.MirroredStrategy()
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()


def generator():
    while True:
        yield [np.ones([10, 10], np.float32), np.ones([10, 10], np.float32)], np.ones([10, 1], np.float32)


with strategy.scope():
    inputA = tf.keras.layers.Input(shape=(10,))
    inputB = tf.keras.layers.Input(shape=(10,))

    output = tf.keras.layers.Concatenate()([inputA, inputB])
    output = tf.keras.layers.Dense(1, input_shape=(10,), activation="relu")(output)
    model = tf.keras.models.Model(inputs=[inputA, inputB], outputs=output)
    model.compile('Adam', 'mae')
    model.fit(generator(), steps_per_epoch=1000, epochs=10)

Traceback (most recent call last):
  File "scripts/distributed_strategy_test/fit_test_multiple_inputs.py", line 20, in <module>
    model.fit(generator(), steps_per_epoch=1000, epochs=10)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 734, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 224, in fit
    distribution_strategy=strategy)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 547, in _process_training_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 605, in _process_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 550, in __init__
    reassemble, nested_dtypes, output_shapes=nested_shape)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 540, in from_generator
    output_types, tensor_shape.as_shape, output_shapes)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/data/util/nest.py", line 471, in map_structure_up_to
    results = [func(*tensors) for tensors in zip(*all_flattened_up_to)]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/data/util/nest.py", line 471, in <listcomp>
    results = [func(*tensors) for tensors in zip(*all_flattened_up_to)]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 1216, in as_shape
    return TensorShape(shape)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 776, in __init__
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 776, in <listcomp>
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 718, in as_dimension
    return Dimension(value)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 193, in __init__
    self._value = int(value)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'

Should I open a new thread for this issue?

I have the same error,

NotImplementedError: fit_generator is not supported for models compiled with tf.distribute.Strategy.

with TPUStrategy.

I am trying to run a keras model on TPU with significant CPU preprocessing of data that I want to be done in parallel with running batches on TPU. The strategy, TPUStrategy is a simple one that distribute everything to one TPU core. There is no reason why it cannot run preprocessing on CPU in parallel.

I shall have to switch to fit from fit_generator and run everything sequentially.

I suggest to reopen this issue.

Hi, I have found that the workaround those not work if the model has multiple inputs. The following code fails:

import numpy as np
import tensorflow as tf
#strategy = tf.distribute.MirroredStrategy()
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()


def generator():
    while True:
        yield [np.ones([10, 10], np.float32), np.ones([10, 10], np.float32)], np.ones([10, 1], np.float32)


with strategy.scope():
    inputA = tf.keras.layers.Input(shape=(10,))
    inputB = tf.keras.layers.Input(shape=(10,))

    output = tf.keras.layers.Concatenate()([inputA, inputB])
    output = tf.keras.layers.Dense(1, input_shape=(10,), activation="relu")(output)
    model = tf.keras.models.Model(inputs=[inputA, inputB], outputs=output)
    model.compile('Adam', 'mae')
    model.fit(generator(), steps_per_epoch=1000, epochs=10)
Traceback (most recent call last):
  File "scripts/distributed_strategy_test/fit_test_multiple_inputs.py", line 20, in <module>
    model.fit(generator(), steps_per_epoch=1000, epochs=10)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 734, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 224, in fit
    distribution_strategy=strategy)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 547, in _process_training_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 605, in _process_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 550, in __init__
    reassemble, nested_dtypes, output_shapes=nested_shape)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 540, in from_generator
    output_types, tensor_shape.as_shape, output_shapes)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/data/util/nest.py", line 471, in map_structure_up_to
    results = [func(*tensors) for tensors in zip(*all_flattened_up_to)]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/data/util/nest.py", line 471, in <listcomp>
    results = [func(*tensors) for tensors in zip(*all_flattened_up_to)]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 1216, in as_shape
    return TensorShape(shape)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 776, in __init__
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 776, in <listcomp>
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 718, in as_dimension
    return Dimension(value)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 193, in __init__
    self._value = int(value)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'

Should I open a new thread for this issue?

did you solve this?

Try returning tuples instead of lists.

import numpy as np
import tensorflow as tf
#strategy = tf.distribute.MirroredStrategy()
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()


def generator():
    while True:
        yield (np.ones([10, 10], np.float32), np.ones([10, 10], np.float32)), np.ones([10, 1], np.float32)


with strategy.scope():
    inputA = tf.keras.layers.Input(shape=(10,))
    inputB = tf.keras.layers.Input(shape=(10,))

    output = tf.keras.layers.Concatenate()([inputA, inputB])
    output = tf.keras.layers.Dense(1, input_shape=(10,), activation="relu")(output)
    model = tf.keras.models.Model(inputs=[inputA, inputB], outputs=output)
    model.compile('Adam', 'mae')
    model.fit(generator(), steps_per_epoch=1000, epochs=10)

This issue has been resolved with TF v2.1.0 by replacing model.fit_generator() with model.fit()

Any update to this? Particularly curious about tf.keras.utils.Sequence

any optimal solution for this issue?

Model.fit() was recently made work with generators and distribution strategies. Could you try the latest version of TF2? Does it provide a work-around for you?

This works:

import numpy as np
import tensorflow as tf
#strategy = tf.distribute.MirroredStrategy()
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()


def generator():
      while True:
        yield np.ones([10, 10], np.float32), np.ones([10, 1], np.float32)


with strategy.scope():
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Dense(1, input_shape=(10,), activation="relu"))
  model.compile('Adam', 'mae')
  model.fit(generator(), steps_per_epoch=1000, epochs=10)

Hi, I have found that the workaround those not work if the model has multiple inputs. The following code fails:

import numpy as np
import tensorflow as tf
#strategy = tf.distribute.MirroredStrategy()
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()


def generator():
    while True:
        yield [np.ones([10, 10], np.float32), np.ones([10, 10], np.float32)], np.ones([10, 1], np.float32)


with strategy.scope():
    inputA = tf.keras.layers.Input(shape=(10,))
    inputB = tf.keras.layers.Input(shape=(10,))

    output = tf.keras.layers.Concatenate()([inputA, inputB])
    output = tf.keras.layers.Dense(1, input_shape=(10,), activation="relu")(output)
    model = tf.keras.models.Model(inputs=[inputA, inputB], outputs=output)
    model.compile('Adam', 'mae')
    model.fit(generator(), steps_per_epoch=1000, epochs=10)
Traceback (most recent call last):
  File "scripts/distributed_strategy_test/fit_test_multiple_inputs.py", line 20, in <module>
    model.fit(generator(), steps_per_epoch=1000, epochs=10)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 734, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 224, in fit
    distribution_strategy=strategy)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 547, in _process_training_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 605, in _process_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 550, in __init__
    reassemble, nested_dtypes, output_shapes=nested_shape)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 540, in from_generator
    output_types, tensor_shape.as_shape, output_shapes)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/data/util/nest.py", line 471, in map_structure_up_to
    results = [func(*tensors) for tensors in zip(*all_flattened_up_to)]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/data/util/nest.py", line 471, in <listcomp>
    results = [func(*tensors) for tensors in zip(*all_flattened_up_to)]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 1216, in as_shape
    return TensorShape(shape)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 776, in __init__
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 776, in <listcomp>
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 718, in as_dimension
    return Dimension(value)
  File "/home/gbarbadillo/miniconda3/envs/tino/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 193, in __init__
    self._value = int(value)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'

Should I open a new thread for this issue?

did you solve this?

Please reopen if it doesn’t provide a workaround.