keras: model.fit() bug when using a zipped Dataset as input for a multiple-input model

(“Cross-post” of https://github.com/tensorflow/tensorflow/issues/54271)

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): 2.9.0.dev20220202
  • Python version: 3.10.2

Describe the problem.

With a multi-input model, feding a dataset which returns a tuple of multiple elements to the tf.keras.model.fit() method, instead of using the whole tuple as input (then distributing each element to each input), the first element in the tuple is used as the input for the whole model.

Describe the current behavior I have a custom model which takes 3 images as input I have 3 separate (currently unbatched as I debug this error) datasets, classes encoded as categorical, meaning each input tensor has shape ((x, y, z), (c,)) Trying to input the 3 datasets separately fails, either by inputting them as a dict mapping each ds to a named input {"Input1": ds1, "Input2": ds2, "Input3": ds3}, or using a list [ds, ds2, ds3].

I zip the three datasets. Testing the resulting dataset with (using the docs as guidance):

for element in zipped_ds.as_numpy_iterator():
print("element", element)

Outputs:

element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] 
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] 
...

Seems to work, right? Every call to the iterator returns 3 elements. Well, when I use the zipped dataset as input of model_fit(), the first element in the tuple returned by the dataset object is treated as the input for the whole model, meaning that instead of using [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] as the input to the model, it uses [[x1, y1, z1], [c1,]], and the training fails.

I’ve tried many approaches, like using zipped_ds.as_numpy_iterator() or ([ds1, ds2, ds3] for idx, (ds1, ds2, ds3) in enumerate(zipped_ds)), but both fail as the returned item is empty

Standalone code to reproduce the issue
# %%
import os

import tensorflow as tf # tensorflow nightly, version>=2.5
from tensorflow import keras
from tensorflow.image import crop_to_bounding_box as tfimgcrop
from tensorflow.keras.preprocessing import image_dataset_from_directory

BATCH_SIZE=32 # Adjust?

IMG_SIZE=(224, 224)
IMG_SHAPE = IMG_SIZE + (3,)

# %%
_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

train_dataset = tf.keras.preprocessing.image_dataset_from_directory(train_dir,
                                             shuffle=False,
                                             label_mode='categorical',
                                             batch_size=32,
                                             image_size=IMG_SIZE)
validation_dataset = tf.keras.preprocessing.image_dataset_from_directory(validation_dir,
                                             shuffle=False,
                                             label_mode='categorical',
                                             batch_size=32,
                                             image_size=IMG_SIZE)

# %%
base_model1 = tf.keras.applications.MobileNetV3Large(input_shape=(64, 64, 3),
                                               include_top=False,
                                               weights='imagenet',
                                               minimalistic=False,
                                               pooling=max,
                                               dropout_rate=0.2)
base_model2 = tf.keras.applications.MobileNetV3Large(input_shape=(64, 64, 3),
                                               include_top=False,
                                               weights='imagenet',
                                               minimalistic=False,
                                               pooling=max,
                                               dropout_rate=0.2)
base_model3 = tf.keras.applications.MobileNetV3Large(input_shape=(64, 64, 3),
                                               include_top=False,
                                               weights='imagenet',
                                               minimalistic=False,
                                               pooling=max,
                                               dropout_rate=0.2)

# %%
pre_concat_layer1 = tf.keras.layers.Dense(64, 
                                        activation='relu', 
                                        kernel_initializer='random_uniform', 
                                        bias_initializer='zeros')
pre_concat_layer2 = tf.keras.layers.Dense(64, 
                                        activation='relu', 
                                        kernel_initializer='random_uniform', 
                                        bias_initializer='zeros')
pre_concat_layer3 = tf.keras.layers.Dense(64, 
                                        activation='relu', 
                                        kernel_initializer='random_uniform', 
                                        bias_initializer='zeros')

post_concat_layer = tf.keras.layers.Dense(128, 
                                        activation='relu', 
                                        kernel_initializer='random_uniform', 
                                        bias_initializer='zeros')
prediction_layer = tf.keras.layers.Dense(2, 
                                        activation='softmax', 
                                        kernel_initializer='random_uniform', 
                                        bias_initializer='zeros')

# %%
input1 = tf.keras.Input(shape=(64, 64, 3), name="First")
input2 = tf.keras.Input(shape=(64, 64, 3), name="Second")
input3 = tf.keras.Input(shape=(64, 64, 3), name="Third")

x = base_model1(input1, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = pre_concat_layer1(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.BatchNormalization()(x)
body1 = tf.keras.Model(input1, outputs)

x = base_model2(input2, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = pre_concat_layer2(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.BatchNormalization()(x)
body2 = tf.keras.Model(input2, outputs)

x = base_model3(input3, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = pre_concat_layer3(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.BatchNormalization()(x)
body3 = tf.keras.Model(input3, outputs)

# %%
body1.get_layer("MobilenetV3large")._name = "MobilenetV3large1"
body2.get_layer("MobilenetV3large")._name = "MobilenetV3large2"
body3.get_layer("MobilenetV3large")._name = "MobilenetV3large3"

# %%
combinedInput = tf.keras.layers.concatenate([body1.output, body2.output, body3.output])
x = post_concat_layer(combinedInput)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
foutput = prediction_layer(x)
final_model = tf.keras.Model(inputs=[body1.input, body2.input, body3.input], outputs=foutput)

# %%
def resize_data1(images, classes):
    return (tfimgcrop(images,
                        offset_height=0,
                        offset_width=0,
                        target_height=64,
                        target_width=64),
                    classes)
def resize_data2(images, classes):
    return (tfimgcrop(images,
                        offset_height=0,
                        offset_width=64,
                        target_height=64,
                        target_width=64),
                    classes)
def resize_data3(images, classes):
    return (tfimgcrop(images,
                        offset_height=0,
                        offset_width=128,
                        target_height=64,
                        target_width=64),
                    classes)

# %%
train_dataset_unb = train_dataset.unbatch()
train_dataset1 = train_dataset_unb.map(resize_data1)
train_dataset2 = train_dataset_unb.map(resize_data2)
train_dataset3 = train_dataset_unb.map(resize_data3)
train_dataset_zip = tf.data.Dataset.zip((train_dataset1, train_dataset2, train_dataset3))

validation_dataset_unb = validation_dataset.unbatch()
validation_dataset1 = validation_dataset_unb.map(resize_data1)
validation_dataset2 = validation_dataset_unb.map(resize_data2)
validation_dataset3 = validation_dataset_unb.map(resize_data3)
validation_dataset_zip = tf.data.Dataset.zip((validation_dataset1, validation_dataset2, validation_dataset3))

# %%
final_model.compile()

# %%
history = final_model.fit(train_dataset_zip,
                        epochs=999, 
                        validation_data=validation_dataset_zip,
                        validation_steps=32
                        )

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (5 by maintainers)

Most upvoted comments

@Faptimus420

Instead of ((img, img, img), label), we can have {"inp1":img,"inp2":img,"inp3":img}, label. The format you mentioned is not possible because

If x is a dataset, generator, or [keras.utils.Sequence](https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence) instance, y should not be specified (since targets will be obtained from x).

source: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

@ghylander

I have a working script (i.e., the model trains smoothly), but I am not sure if that is what you want. Please take a look and let me know:

The key issue here is that we are inspecting only the inputs and not the outputs. Note that the model architecture is 3 inputs, 1 output. Thus, our dataset should be in that format as well. See the image below.

Now, the current dataset has the structure: ((img, label), (img, label), (img, label)) but actually we want it to be ((img, img, img), label). So, we simply write a function which does exactly this and map the dataset accordingly.

def post_zip_process(example1, example2, example3):
    print((example1[0], example2[0], example3[0]), example1[1])
    return (example1[0], example2[0], example3[0]), example1[1]

train_dataset_zip = train_dataset_zip.map(post_zip_process)
validation_dataset_zip = validation_dataset_zip.map(post_zip_process)

And the training works just fine. Please take a look at the gist here.

image

/cc @Faptimus420 @rozhanroukhosh @gowthamkpr @sushreebarsa

The code I posted is executable top to bottom (given you meet the dependency, tensorflow). While it uses the cats and dogs tensorflow dataset, it mimics exactly my real code, down to the same error with model.fit()

The error I get after the last line is:

ValueError: in user code:

    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 859, in train_step
        y_pred = self(x, training=True)
    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/input_spec.py", line 200, in assert_input_compatibility
        raise ValueError(f'Layer "{layer_name}" expects {len(input_spec)} input(s),'

    ValueError: Layer "model_3" expects 3 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(64, 64, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(1,) dtype=float32>]

Regarding what the train_dataset_zip object returns, this code:

for element in train_dataset_zip.as_numpy_iterator():
print("element", element)

returns:

element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] 
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] 

and this code:

for idx, (ds1, ds2, ds3) in enumerate(train_dataset_zip):
    print("ds1: ", ds1)
    print("ds2: ", ds2)
    print("ds3: ", ds3)

returns:

ds1:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds2:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds3:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds1:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds2:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds3:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>

Both methods return the 3 elements you’d expect to be returned, ((image1, class1), (image2, class2), (image3, class3)). Furthermore, the traceback already reveals that only the first (I assume it’s the first) element (image1, class1) is reaching the model:

    ValueError: Layer "model_3" expects 3 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(64, 64, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(1,) dtype=float32>]

Additionally, the tf.data.Dataset.zip docs show what the behaviour of the method is:

>>> a = tf.data.Dataset.range(1, 4)  # ==> [ 1, 2, 3 ]
>>> b = tf.data.Dataset.range(4, 7)  # ==> [ 4, 5, 6 ]
>>> ds = tf.data.Dataset.zip((a, b))
>>> list(ds.as_numpy_iterator())
[(1, 4), (2, 5), (3, 6)]