keras: Tensorflow backend is 4x slower than Theano
@fchollet this should interest you.
Below you can find attached timings for 3 runs of the model with different backends. It’s a simple FFN, 3 layers + dropouts + batchnorm (see the attached code for details)
It seems like TensorFlow backend is extremely inefficient and should not be used at the moment. Even with native TF code, Theano still comes on top. Perhaps this should be mentioned in the docs.
Perhaps 1.0 release will make it easier to work with TF itself and make the tf backend more useful?
TensorFlow
is a standard K.backend
TensorFlowNative
is a quick hack to replace everything from softmax up (softmax, optimizer, updates) with TF code
Theano
is benched with and without allow_gc
GPU: GTX 980
+ cuda-7.5
+ cudnn-4.0.7
Backend | Timing |
---|---|
TensorFlow | [29.6s, 29.8s, 30.6s] |
TensorFlowNative | [11.6s, 11.4s, 11.5s] |
Theano noGC | [7.4s, 7.6s, 7.4s] |
Theano GC | [7.7s, 7.66s, 7.66s] |
Here’s the script to reproduce those results.
import numpy as np
import numpy.random as nr
from keras.utils import np_utils
from keras.layers.core import Layer, Dense, Dropout
from keras.layers.normalization import BatchNormalization
from keras.models import Sequential
X = np.random.randn(10000, 500)
y = np.random.randint(0, 64, size=(10000, 1))
y = np_utils.to_categorical(y)
test = Sequential()
test.add(Layer(input_shape=(500, )))
test.add(Dense(1024, activation='relu'))
test.add(Dropout(0.5))
test.add(BatchNormalization())
test.add(Dense(1024, activation='relu'))
test.add(Dropout(0.5))
test.add(BatchNormalization())
test.add(Dense(1024, activation='relu'))
test.add(Dropout(0.5))
test.add(BatchNormalization())
test.add(Dense(64, activation='softmax'))
test.compile('adam', 'categorical_crossentropy')
%%time
test.fit(X, y, nb_epoch=25, verbose=0, )
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 23 (9 by maintainers)
TL;DR: set image_data_format to “channels_first” if you are using the Theano backend, otherwise, you might experience slow downs (definitely happens with batch normalization).
I tried training a network on Windows and Ubuntu with Keras+Theano using the channel order by default. According to keras.json, the default was “channels_last”. I could train as usual and didn’t experience anything weird. At some point, I added batch normalization and observed a massive slow down (training epochs taking 2-4 times more time). I tried many things without success until I saw @the-moliver comment above. Matching the channel order to the backend is not optional but necessary (I think this is not clear in the documentation). I’d advise to change the documentation or even the code to enforce the use of the channel order that matches the backend (tensorflow: channels_last, theano: channels_first).
Thanks @the-moliver!! 😃
Make sure you use ‘channels_first’ as the dim ordering for the Theano backend, it will be much faster. Otherwise Keras has to do a lot of dim shuffling which slows things down. I can see from your post you are using ‘channels_last’; x_train shape should be (60000, 1, 28, 28) when used with ‘channels_first’.