keras: Tensorflow backend is 4x slower than Theano

@fchollet this should interest you.

Below you can find attached timings for 3 runs of the model with different backends. It’s a simple FFN, 3 layers + dropouts + batchnorm (see the attached code for details)

It seems like TensorFlow backend is extremely inefficient and should not be used at the moment. Even with native TF code, Theano still comes on top. Perhaps this should be mentioned in the docs.

Perhaps 1.0 release will make it easier to work with TF itself and make the tf backend more useful?

TensorFlow is a standard K.backend TensorFlowNative is a quick hack to replace everything from softmax up (softmax, optimizer, updates) with TF code Theano is benched with and without allow_gc

GPU: GTX 980 + cuda-7.5 + cudnn-4.0.7

Backend	Timing
TensorFlow	`[29.6s, 29.8s, 30.6s]`
TensorFlowNative	`[11.6s, 11.4s, 11.5s]`
Theano noGC	`[7.4s, 7.6s, 7.4s]`
Theano GC	`[7.7s, 7.66s, 7.66s]`

Here’s the script to reproduce those results.

import numpy as np
import numpy.random as nr

from keras.utils import np_utils
from keras.layers.core import Layer, Dense, Dropout
from keras.layers.normalization import BatchNormalization
from keras.models import Sequential

X = np.random.randn(10000, 500)
y = np.random.randint(0, 64, size=(10000, 1))
y = np_utils.to_categorical(y)

test = Sequential()
test.add(Layer(input_shape=(500, )))

test.add(Dense(1024, activation='relu'))
test.add(Dropout(0.5))
test.add(BatchNormalization())

test.add(Dense(1024, activation='relu'))
test.add(Dropout(0.5))
test.add(BatchNormalization())

test.add(Dense(1024, activation='relu'))
test.add(Dropout(0.5))
test.add(BatchNormalization())

test.add(Dense(64, activation='softmax'))

test.compile('adam', 'categorical_crossentropy')

%%time 
test.fit(X, y, nb_epoch=25, verbose=0, )

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 23 (9 by maintainers)

Most upvoted comments

Make sure you use ‘channels_first’ as the dim ordering for the Theano backend, it will be much faster. Otherwise Keras has to do a lot of dim shuffling which slows things down. I can see from your post you are using ‘channels_last’; x_train shape should be (60000, 1, 28, 28) when used with ‘channels_first’.

TL;DR: set image_data_format to “channels_first” if you are using the Theano backend, otherwise, you might experience slow downs (definitely happens with batch normalization).

I tried training a network on Windows and Ubuntu with Keras+Theano using the channel order by default. According to keras.json, the default was “channels_last”. I could train as usual and didn’t experience anything weird. At some point, I added batch normalization and observed a massive slow down (training epochs taking 2-4 times more time). I tried many things without success until I saw @the-moliver comment above. Matching the channel order to the backend is not optional but necessary (I think this is not clear in the documentation). I’d advise to change the documentation or even the code to enforce the use of the channel order that matches the backend (tensorflow: channels_last, theano: channels_first).

Thanks @the-moliver!! 😃

davidtellez on Sep 9, 2017

Make sure you use ‘channels_first’ as the dim ordering for the Theano backend, it will be much faster. Otherwise Keras has to do a lot of dim shuffling which slows things down. I can see from your post you are using ‘channels_last’; x_train shape should be (60000, 1, 28, 28) when used with ‘channels_first’.

the-moliver on Apr 14, 2017