tensorflow: Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using `Tf.Keras` and `model.fit_generator`
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information NOTE: I have provided Google Colab’ notebooks to reproduce the slowness.
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): sortof, but is a basically and MNIST example.
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google Colab and Windows
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: NA
- TensorFlow installed from (source or binary): pip install tensorflow-gpu
- TensorFlow version (use command below): 2.0.0
- Python version: 3
- Bazel version (if compiling from source): NA
- GCC/Compiler version (if compiling from source): NA
- CUDA/cuDNN version: 10 or Google Colab
- GPU model and memory: 1080 Ti, or Google Colab
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
It happens on the standard Colab GPU instance
Describe the current behavior
Version 2.0.0 is SLOW compared to the identical code running v1.14.0.
The code I have used to demonstrate it is very simple, and very similar to most existing Keras examples.
A Larger NN on MNIST is going from 10s per epoch to ~20s and that is a very major slowdown.
Describe the expected behavior A new version should have similar or better performance than the previous version. If user error or a new limitation/feature is causing the problem, it should be warned about in Update Notes/Quick Start. This code was perfectly normal in TF 1.X
Code to reproduce the issue See this (GPU) Colab Notebook example with MNIST Data: https://colab.research.google.com/gist/Raukk/f0927a5e2a357f2d80c9aeef1202e6ee/example_slow_tf2.ipynb
See this (GPU) Colab Notebook example with numpy random for Data: https://colab.research.google.com/gist/Raukk/518d3d21e08ad02089429529bd6c67d4/simplified_example_slow_tf2.ipynb
See this (GPU) Colab Notebook example using standard Conv2D (not DepthwiseConv2D): https://colab.research.google.com/gist/Raukk/4f102e192f47a6dc144b890925b652f8/standardconv_example_slow_tf2.ipynb
Please notify me if you cannot access any of these notebooks, or if they do not run, or don’t sufficiently reproduce the issue.
Other info / logs Each example above starts with a TLDR; that gives a very basic summary of results.
Thank you!
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 2
- Comments: 43 (4 by maintainers)
Have also experienced a large (4x) increase in training times for Keras models when using fit_generator after upgrading to TensorFlow 2.0.
Execution times became comparable to TF 1.14 when disabling eager execution by running:
tf.compat.v1.disable_eager_execution().First of all, thank you for the wonderful repro. I can’t tell you how much easier it makes all of this.
It looks like
fit_generatoris incorrectly falling back to the eager path, which is why training is slower. I will look into why, but in the mean time can you try usingmodel.fit? It actually also supports generators (we plan to deprecate the fit_generator endpoint at some point as it is now obsolete), and in my testing is actually faster than the 1.14 baseline.@robieta So I just ran my code by passing the generator function directly into model.fit(), and that seemed to fix the issue completely!
Basically my pseudo code now looks like this:
So basically what I learned is:
Thanks so much for everything!
(As a side note for anyone else reading: the validation_data argument in model.fit() can also take a generator directly as input)
@max1mn If you replace
return batch_x, batch_ywithreturn tuple(batch_x), batch_yyour code should work. This stems from a historic decision in tf.data about how to treat lists. I will make fit robust to this, but adding thattuple()will immediately unblock you. Sorry for the inconvenience.Excellent! I’m planning on just aliasing
(fit / evaluate / predict)_generatorto(fit / evaluate / predict), as those methods are now strictly superior.I’m still using tensorflow-gpu, whose latest version is 2.0.0. When I use
model.fit(x=generator, shuffle=False, workers=8, ...), it seems that there is still only one worker whether I setmultiprocessing=Trueor not. Could you please verify this behavior?I’ve also encountered this performance issue:
After using
tf.compat.v1.disable_eager_execution(), the training time of fit_generator in tf2 reduces to 14s. It’s comparable to tf1 but still 3x slower than fit in tf2.model.fit(x=sequence, ...)also completes the training in 14s but it seems to load all data into memory and log “Filling up shuffle buffer (this may take a while)” if I setshuffle=True. Any ideas?@Dr-Gandalf No, no, thank you. This is an important performance detail and I’m very happy that it’s now going to make it into 2.1. Thanks for reporting.
@robieta I am using tf 2.0.0 inside a docker from docker hub, “tensorflow/tensorflow:latest-gpu-py3-jupyter”.
these are my imports
the console message I am getting is :
Filling up shuffle buffer (this may take a while): 10 of 5802@robieta Thanks for the comment! I’ll try it out later this evening or tomorrow and get back to you. If I’m still having issues, will make a colab to share and demonstrate it. Currently I was using some internal data which I cannot share, hence the pseudocode.
@mihaimaruseac I just tested my code on TF version 1.15.0rc2. It seems to be equally as slow as TF2.0, I hope this helps with your debugging!
EDIT: TF1.15.0rc2 seems to be faster by a few seconds (35-40s per epoch) as compared to TF2.0 (40-45s per epochs).
@robieta I had also attempted model.fit() without any improvement to performance. It may have to do with my current implementation, so I’m pasting some pseudo code below. I am hoping there is something fundamentally wrong with it (I’m thinking its the usage of lambda):
For the pseudocode using tf.data.Dataset.from_generator():