tensorflow: tf.keras.model fit() significantly slower when using weighted validation data in comparison to tf2.1.0

Please make sure that this is an issue related to performance of TensorFlow. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:performance_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ArchLinux & Ubuntu 18.04 LTS
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): v2.2.0-rc4-8-g2b96f3662b 2.2.0 (compared to: v2.1.0-rc2-17-ge5bf8de 2.1.0)
  • Python version: 3.7.5

The ArchLinux machine runs on CPU The Ubuntu machine runs on GPU with:

  • CUDA/cuDNN version: 10.1.243
  • GPU model and memory: GeForce GTX 1080 with 7126 MB memory

Describe the current behavior When training a simple tf.keras.model multilayer perceptron with a call to .fit() containing a validation_data that contains weights results in a significant slower fit() then in comparison to TensorFlow 2.1.0 with the exact same code.

Describe the expected behavior Similar performance between TensorFlow 2.1.0 and 2.2.0 when training a tf.keras.model with a weighted validation data set.

Standalone code to reproduce the issue Package requirements for code snippet using python 3.7.5:

numpy= "==1.18.2"
tensorflow = "==2.2.0"
tensorflow-datasets = "==3.1.0"
import typing

import numpy as np
from tensorflow import keras
import tensorflow_datasets as tfds


def build_neural_network(input_dimension: int, number_of_classes: int, compile_options: dict):
    model = keras.Sequential()
    model.add(keras.layers.Dense(112, activation='relu', input_dim=input_dimension))
    model.add(keras.layers.Dense(112, activation='relu'))
    model.add(keras.layers.Dense(number_of_classes, activation='softmax'))

    model.compile(**compile_options)

    print(model.summary())

    return model

def load_in_images_and_labels_and_reshape(dataset) -> typing.Tuple[np.ndarray, np.ndarray]:
    images = []
    labels = []
    for image, label in tfds.as_numpy(dataset):
        new_image_shape = image.shape[0] * image.shape[1]
        images.append(image.reshape(new_image_shape))
        labels.append(label)

    return np.array(images), np.array(labels)


def train_neural_network(is_random_weighing: bool):
    dataset_train      = tfds.load('emnist', split='train', as_supervised=True)
    dataset_validation = tfds.load('emnist', split='test', as_supervised=True)

    train_images, train_labels           = load_in_images_and_labels_and_reshape(dataset_train)
    validation_images, validation_labels = load_in_images_and_labels_and_reshape(dataset_validation)
    train_labels      = keras.utils.to_categorical(train_labels)
    validation_labels = keras.utils.to_categorical(validation_labels)

    print("load")
    compile_options =  {
        "loss": "categorical_crossentropy",
        "optimizer": "adam",
        "metrics": ["categorical_accuracy"],
        "weighted_metrics": ["categorical_accuracy"]
    }
    network = build_neural_network(train_images.shape[-1], len(train_labels[0]), compile_options)

    fit_options = {    
        "batch_size": 2048,
        "epochs": 10,
        "verbose": 1,
        "workers": 1
    }
    if is_random_weighing:
        random_weights = np.random.rand(len(validation_images))
        validation_data_tuple = (validation_images, validation_labels, random_weights)
    else:
        validation_data_tuple = (validation_images, validation_labels)
    history = network.fit(train_images, train_labels, validation_data=validation_data_tuple, **fit_options)


if __name__ == "__main__":
    is_random_weighing = True
    train_neural_network(is_random_weighing)

Other info / logs Running the above code snippet on the ArchLinux machine, run on CPU: takes roughly 19 seconds per epoch. When the same code is run in TensorFlow 2.1.0 it takes roughly 5 seconds per epoch. When the weighing off the validation dataset is turned off with TensorFlow 2.2.0 (is_random_weighing = False) the performance becomes similar to TensorFlow 2.1.0; roughly 5 seconds per epoch. The slowdown is also seen on the Ubuntu machine, run on GPU, but then due to likely different hardware, tf 2.2.0 is 7 times as slow as tf 2.1.0.

The effect was not seen (but maybe it was not measurable) when using mnist in place of emnist.

The issue seems related to: #39039 In which the comment by @romanovzky brought to light that it might be due to the validation data or validation split. Although that is in the context of comparing a tensorflow estimator to keras.

This issue also seems related to: #39434 In which also from tf2.1 to tf.2.2 a significant performance drop is seen.

It seems like another small puzzle piece in a larger puzzle (or I do something simple wrong on both machines).

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 16 (7 by maintainers)

Most upvoted comments

I had the same issue and was able to circumvent it by converting my weights numpy-array into a pandas series. Training now starts immediately and I do not have to wait anymore. pd.Series(my_weights)

This regretion is not completely fixed with 2.3.0. It seems that, for whatever reason, the first epoch needs a long time to start, and the first validation step is also very slow. From the 2nd epoch onward, the epoch times are comparable. ~This can be reproduced in this colab~. EDIT: Colab link removed as it is pointing to another colab, I have lost (probably deleted) the original one.

@sirvincent thanks for reporting the issue, a fix was submitted in 1d2d05f. that is available in the latest nightly.

Tensorflow 2.2 takes much more time than 2.1/2.0 to start training, after called “keras.fit”.

2020-06-01 10:16:44.991459: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-01 10:16:46.235945: W tensorflow/stream_executor/gpu/asm_compiler.cc:81] Running ptxas --version returned 256
2020-06-01 10:16:46.328871: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-06-01 10:16:48.148004: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-01 10:23:36.473814: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.

It stucks about 7 mins to start training.

Interesting that the random weighing causes the performance slowdown. In my case, turning on dropout layers (even with dropout_prob=0) causes the performance slowdown. Could it be something in the tensorflow randomness modules?