tensorflow: Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using `Tf.Keras` and `model.fit_generator`

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information NOTE: I have provided Google Colab’ notebooks to reproduce the slowness.

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): sortof, but is a basically and MNIST example.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google Colab and Windows
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: NA
TensorFlow installed from (source or binary): pip install tensorflow-gpu
TensorFlow version (use command below): 2.0.0
Python version: 3
Bazel version (if compiling from source): NA
GCC/Compiler version (if compiling from source): NA
CUDA/cuDNN version: 10 or Google Colab
GPU model and memory: 1080 Ti, or Google Colab

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)" It happens on the standard Colab GPU instance

Describe the current behavior Version 2.0.0 is SLOW compared to the identical code running v1.14.0. The code I have used to demonstrate it is very simple, and very similar to most existing Keras examples. A Larger NN on MNIST is going from 10s per epoch to ~20s and that is a very major slowdown.

Describe the expected behavior A new version should have similar or better performance than the previous version. If user error or a new limitation/feature is causing the problem, it should be warned about in Update Notes/Quick Start. This code was perfectly normal in TF 1.X

Code to reproduce the issue See this (GPU) Colab Notebook example with MNIST Data: https://colab.research.google.com/gist/Raukk/f0927a5e2a357f2d80c9aeef1202e6ee/example_slow_tf2.ipynb

See this (GPU) Colab Notebook example with numpy random for Data: https://colab.research.google.com/gist/Raukk/518d3d21e08ad02089429529bd6c67d4/simplified_example_slow_tf2.ipynb

See this (GPU) Colab Notebook example using standard Conv2D (not DepthwiseConv2D): https://colab.research.google.com/gist/Raukk/4f102e192f47a6dc144b890925b652f8/standardconv_example_slow_tf2.ipynb

Please notify me if you cannot access any of these notebooks, or if they do not run, or don’t sufficiently reproduce the issue.

Other info / logs Each example above starts with a TLDR; that gives a very basic summary of results.

Thank you!

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 2
Comments: 43 (4 by maintainers)

Most upvoted comments

Have also experienced a large (4x) increase in training times for Keras models when using fit_generator after upgrading to TensorFlow 2.0.

Execution times became comparable to TF 1.14 when disabling eager execution by running: tf.compat.v1.disable_eager_execution().

+10

Szubie on Oct 3, 2019

First of all, thank you for the wonderful repro. I can’t tell you how much easier it makes all of this.

It looks like fit_generator is incorrectly falling back to the eager path, which is why training is slower. I will look into why, but in the mean time can you try using model.fit? It actually also supports generators (we plan to deprecate the fit_generator endpoint at some point as it is now obsolete), and in my testing is actually faster than the 1.14 baseline.

robieta on Oct 7, 2019

@robieta So I just ran my code by passing the generator function directly into model.fit(), and that seemed to fix the issue completely!

Basically my pseudo code now looks like this:

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        yield x,y

#some code here to create and compile model 

model.fit(datagen(args) , . . . )

So basically what I learned is:

don’t use model.fit_generator() anymore
don’t call Dataset.from_generator() separately
just use model.fit() and pass the generator directly into it.

Thanks so much for everything!

(As a side note for anyone else reading: the validation_data argument in model.fit() can also take a generator directly as input)

DanMinhNguyen on Oct 7, 2019

@max1mn If you replace return batch_x, batch_y with return tuple(batch_x), batch_y your code should work. This stems from a historic decision in tf.data about how to treat lists. I will make fit robust to this, but adding that tuple() will immediately unblock you. Sorry for the inconvenience.

robieta on Oct 8, 2019

Excellent! I’m planning on just aliasing (fit / evaluate / predict)_generator to (fit / evaluate / predict), as those methods are now strictly superior.

robieta on Oct 7, 2019

@ychervonyi Are you using the latest tf-nightly? ac20030 is the relevant change, and should be in tf-nightly==2.1.0.dev20191109 Feel free to post a repro colab if you’re seeing Sequence shuffling handled inefficiently.

I’m still using tensorflow-gpu, whose latest version is 2.0.0. When I use model.fit(x=generator, shuffle=False, workers=8, ...), it seems that there is still only one worker whether I set multiprocessing=True or not. Could you please verify this behavior?

Seterplus on Nov 9, 2019

Excellent! I’m planning on just aliasing (fit / evaluate / predict)_generator to (fit / evaluate / predict), as those methods are now strictly superior.

I’ve also encountered this performance issue:

	fit	fit_generator
tf1	25s	13s
tf2	4s	28s

After using tf.compat.v1.disable_eager_execution(), the training time of fit_generator in tf2 reduces to 14s. It’s comparable to tf1 but still 3x slower than fit in tf2. model.fit(x=sequence, ...) also completes the training in 14s but it seems to load all data into memory and log “Filling up shuffle buffer (this may take a while)” if I set shuffle=True. Any ideas?

Seterplus on Nov 7, 2019

@Dr-Gandalf No, no, thank you. This is an important performance detail and I’m very happy that it’s now going to make it into 2.1. Thanks for reporting.

robieta on Dec 12, 2019

@robieta I am using tf 2.0.0 inside a docker from docker hub, “tensorflow/tensorflow:latest-gpu-py3-jupyter”.

these are my imports

from tensorflow import keras
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten, concatenate, Dropout, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import Iterator
from tensorflow.keras.applications.densenet import DenseNet121, preprocess_input
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard, EarlyStopping, ReduceLROnPlateau
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from tensorflow.keras.utils import Sequence
import tensorflow as tf
from datetime import datetime
import io
from sklearn.metrics import roc_curve,roc_auc_score

#this is the return line of my generator:

return (X1i[0], X2i[0], X3i[0]), X1i[1]

#this is the training line I am using, that it used to work fine with fit_generator:

T_history = classification_model.fit(trainGenerator, steps_per_epoch=steps_per_epoch,
                                              validation_data=validation_generator,
                                              validation_steps=validation_steps, 
                                              callbacks=callbacks_list,
                                              epochs=6,
                                              use_multiprocessing=True,
                                              workers=8,
                                              max_queue_size=50)

the console message I am getting is :

Filling up shuffle buffer (this may take a while): 10 of 5802

Dr-Gandalf on Dec 12, 2019

@robieta Thanks for the comment! I’ll try it out later this evening or tomorrow and get back to you. If I’m still having issues, will make a colab to share and demonstrate it. Currently I was using some internal data which I cannot share, hence the pseudocode.

DanMinhNguyen on Oct 7, 2019

@mihaimaruseac I just tested my code on TF version 1.15.0rc2. It seems to be equally as slow as TF2.0, I hope this helps with your debugging!

EDIT: TF1.15.0rc2 seems to be faster by a few seconds (35-40s per epoch) as compared to TF2.0 (40-45s per epochs).

DanMinhNguyen on Oct 7, 2019

@robieta I had also attempted model.fit() without any improvement to performance. It may have to do with my current implementation, so I’m pasting some pseudo code below. I am hoping there is something fundamentally wrong with it (I’m thinking its the usage of lambda):

For the pseudocode using tf.data.Dataset.from_generator():

from tensorflow.compat.v2.data import Dataset

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        yield x,y

#some code here to create and compile model 

#I'm thinking the performance issue here is in using lambda. However, without this I get a 
#"'generator' must be callable" error

train_data = Dataset.from_generator(generator=lambda: datagen(args), . . . )
model.fit(train_data , . . . )

DanMinhNguyen on Oct 7, 2019