keras: mnist_cnn.py does not give reproducible results

If I run python mnist_cnn.py twice I get different results. Conceptually, I have no clue what might be going on here. Here are two example outputs:

Using Theano backend.
Using gpu device 0: GeForce GT 750M (CNMeM is enabled with initial size: 75.0% of memory, CuDNN 4007)
X_train shape: (60000, 1, 28, 28)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 25s - loss: 0.2538 - acc: 0.9227 - val_loss: 0.0534 - val_acc: 0.9832
Epoch 2/12
60000/60000 [==============================] - 24s - loss: 0.0945 - acc: 0.9714 - val_loss: 0.0378 - val_acc: 0.9876
Epoch 3/12
60000/60000 [==============================] - 24s - loss: 0.0704 - acc: 0.9787 - val_loss: 0.0355 - val_acc: 0.9883
Epoch 4/12
60000/60000 [==============================] - 24s - loss: 0.0584 - acc: 0.9830 - val_loss: 0.0331 - val_acc: 0.9893
Epoch 5/12
60000/60000 [==============================] - 24s - loss: 0.0489 - acc: 0.9848 - val_loss: 0.0305 - val_acc: 0.9897
Epoch 6/12
60000/60000 [==============================] - 24s - loss: 0.0428 - acc: 0.9870 - val_loss: 0.0315 - val_acc: 0.9901
Epoch 7/12
60000/60000 [==============================] - 24s - loss: 0.0383 - acc: 0.9880 - val_loss: 0.0305 - val_acc: 0.9910
Epoch 8/12
60000/60000 [==============================] - 24s - loss: 0.0373 - acc: 0.9881 - val_loss: 0.0298 - val_acc: 0.9903
Epoch 9/12
60000/60000 [==============================] - 24s - loss: 0.0320 - acc: 0.9901 - val_loss: 0.0286 - val_acc: 0.9911
Epoch 10/12
60000/60000 [==============================] - 24s - loss: 0.0311 - acc: 0.9902 - val_loss: 0.0284 - val_acc: 0.9913
Epoch 11/12
60000/60000 [==============================] - 24s - loss: 0.0282 - acc: 0.9910 - val_loss: 0.0290 - val_acc: 0.9910
Epoch 12/12
60000/60000 [==============================] - 24s - loss: 0.0264 - acc: 0.9916 - val_loss: 0.0296 - val_acc: 0.9910
Test score: 0.0296402487505
Test accuracy: 0.991

Using Theano backend.
Using gpu device 0: GeForce GT 750M (CNMeM is enabled with initial size: 75.0% of memory, CuDNN 4007)
X_train shape: (60000, 1, 28, 28)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 25s - loss: 0.2543 - acc: 0.9227 - val_loss: 0.0574 - val_acc: 0.9819
Epoch 2/12
60000/60000 [==============================] - 24s - loss: 0.0939 - acc: 0.9719 - val_loss: 0.0403 - val_acc: 0.9869
Epoch 3/12
60000/60000 [==============================] - 24s - loss: 0.0709 - acc: 0.9789 - val_loss: 0.0371 - val_acc: 0.9870
Epoch 4/12
60000/60000 [==============================] - 24s - loss: 0.0584 - acc: 0.9828 - val_loss: 0.0318 - val_acc: 0.9888
Epoch 5/12
60000/60000 [==============================] - 24s - loss: 0.0492 - acc: 0.9850 - val_loss: 0.0292 - val_acc: 0.9900
Epoch 6/12
60000/60000 [==============================] - 24s - loss: 0.0420 - acc: 0.9867 - val_loss: 0.0313 - val_acc: 0.9897
Epoch 7/12
60000/60000 [==============================] - 24s - loss: 0.0393 - acc: 0.9875 - val_loss: 0.0303 - val_acc: 0.9905
Epoch 8/12
60000/60000 [==============================] - 24s - loss: 0.0372 - acc: 0.9883 - val_loss: 0.0293 - val_acc: 0.9914
Epoch 9/12
60000/60000 [==============================] - 24s - loss: 0.0311 - acc: 0.9907 - val_loss: 0.0279 - val_acc: 0.9909
Epoch 10/12
60000/60000 [==============================] - 24s - loss: 0.0319 - acc: 0.9900 - val_loss: 0.0269 - val_acc: 0.9920
Epoch 11/12
60000/60000 [==============================] - 24s - loss: 0.0282 - acc: 0.9914 - val_loss: 0.0283 - val_acc: 0.9913
Epoch 12/12
60000/60000 [==============================] - 24s - loss: 0.0270 - acc: 0.9916 - val_loss: 0.0312 - val_acc: 0.9907
Test score: 0.0312398415336
Test accuracy: 0.9907

I’m on terrible internet right now and can’t double check that Keras and Theano are 100% up to date, but pip says I’m at Keras 1.0.1, Theano 0.8.1, numpy 1.11.0.

Running this script twice works as expected:

import numpy
print(numpy.random.random(3))

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 15 (5 by maintainers)

Most upvoted comments

@NasenSpray thanks! That was the issue. Back on good internet, I double-checked that numpy, Theano and Keras are 100% up to date, and verified the options to fix this:

Disable cuDNN with THEANO_FLAGS="optimizer_excluding=conv_dnn" python mnist_cnn.py, which takes twice as long at around 60s/epoch instead of 25s/epoch.
Force deterministic behavior in cuDNN with THEANO_FLAGS="dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic" python mnist_cnn.py (the flag mentioned in the quoted post has been deprecated in favor of two flags). This actually runs in the same time or slightly faster on my machine, at 24s/epoch consistently.

Going to close this as it’s not really a “bug” with Keras itself, so much as different behavior between different libraries. Though it might make sense to remove the np.random.seed(1337) at the top of many of the examples, now that it’s clear this line doesn’t guarantee anything.

kylemcdonald on Apr 24, 2016

It seems that I solved this problem in this way: http://blog.csdn.net/qq_33039859/article/details/75452813 step1: fix the numpy random seed at the top of code step2: be sure that model.fit(shuffle=False)

GuokaiLiu on Jul 19, 2017

@giorgiop to be clear:

you’re started with the cifar10 example
you added np.random.seed(1337) near the top of the file
you are either disabling cuDNN or forcing deterministic behavior with the flags mentioned above
you’re seeing non-deterministic behavior

is that correct?

kylemcdonald on May 14, 2016