keras: mnist_cnn.py does not give reproducible results
If I run python mnist_cnn.py
twice I get different results. Conceptually, I have no clue what might be going on here. Here are two example outputs:
Using Theano backend.
Using gpu device 0: GeForce GT 750M (CNMeM is enabled with initial size: 75.0% of memory, CuDNN 4007)
X_train shape: (60000, 1, 28, 28)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 25s - loss: 0.2538 - acc: 0.9227 - val_loss: 0.0534 - val_acc: 0.9832
Epoch 2/12
60000/60000 [==============================] - 24s - loss: 0.0945 - acc: 0.9714 - val_loss: 0.0378 - val_acc: 0.9876
Epoch 3/12
60000/60000 [==============================] - 24s - loss: 0.0704 - acc: 0.9787 - val_loss: 0.0355 - val_acc: 0.9883
Epoch 4/12
60000/60000 [==============================] - 24s - loss: 0.0584 - acc: 0.9830 - val_loss: 0.0331 - val_acc: 0.9893
Epoch 5/12
60000/60000 [==============================] - 24s - loss: 0.0489 - acc: 0.9848 - val_loss: 0.0305 - val_acc: 0.9897
Epoch 6/12
60000/60000 [==============================] - 24s - loss: 0.0428 - acc: 0.9870 - val_loss: 0.0315 - val_acc: 0.9901
Epoch 7/12
60000/60000 [==============================] - 24s - loss: 0.0383 - acc: 0.9880 - val_loss: 0.0305 - val_acc: 0.9910
Epoch 8/12
60000/60000 [==============================] - 24s - loss: 0.0373 - acc: 0.9881 - val_loss: 0.0298 - val_acc: 0.9903
Epoch 9/12
60000/60000 [==============================] - 24s - loss: 0.0320 - acc: 0.9901 - val_loss: 0.0286 - val_acc: 0.9911
Epoch 10/12
60000/60000 [==============================] - 24s - loss: 0.0311 - acc: 0.9902 - val_loss: 0.0284 - val_acc: 0.9913
Epoch 11/12
60000/60000 [==============================] - 24s - loss: 0.0282 - acc: 0.9910 - val_loss: 0.0290 - val_acc: 0.9910
Epoch 12/12
60000/60000 [==============================] - 24s - loss: 0.0264 - acc: 0.9916 - val_loss: 0.0296 - val_acc: 0.9910
Test score: 0.0296402487505
Test accuracy: 0.991
Using Theano backend.
Using gpu device 0: GeForce GT 750M (CNMeM is enabled with initial size: 75.0% of memory, CuDNN 4007)
X_train shape: (60000, 1, 28, 28)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 25s - loss: 0.2543 - acc: 0.9227 - val_loss: 0.0574 - val_acc: 0.9819
Epoch 2/12
60000/60000 [==============================] - 24s - loss: 0.0939 - acc: 0.9719 - val_loss: 0.0403 - val_acc: 0.9869
Epoch 3/12
60000/60000 [==============================] - 24s - loss: 0.0709 - acc: 0.9789 - val_loss: 0.0371 - val_acc: 0.9870
Epoch 4/12
60000/60000 [==============================] - 24s - loss: 0.0584 - acc: 0.9828 - val_loss: 0.0318 - val_acc: 0.9888
Epoch 5/12
60000/60000 [==============================] - 24s - loss: 0.0492 - acc: 0.9850 - val_loss: 0.0292 - val_acc: 0.9900
Epoch 6/12
60000/60000 [==============================] - 24s - loss: 0.0420 - acc: 0.9867 - val_loss: 0.0313 - val_acc: 0.9897
Epoch 7/12
60000/60000 [==============================] - 24s - loss: 0.0393 - acc: 0.9875 - val_loss: 0.0303 - val_acc: 0.9905
Epoch 8/12
60000/60000 [==============================] - 24s - loss: 0.0372 - acc: 0.9883 - val_loss: 0.0293 - val_acc: 0.9914
Epoch 9/12
60000/60000 [==============================] - 24s - loss: 0.0311 - acc: 0.9907 - val_loss: 0.0279 - val_acc: 0.9909
Epoch 10/12
60000/60000 [==============================] - 24s - loss: 0.0319 - acc: 0.9900 - val_loss: 0.0269 - val_acc: 0.9920
Epoch 11/12
60000/60000 [==============================] - 24s - loss: 0.0282 - acc: 0.9914 - val_loss: 0.0283 - val_acc: 0.9913
Epoch 12/12
60000/60000 [==============================] - 24s - loss: 0.0270 - acc: 0.9916 - val_loss: 0.0312 - val_acc: 0.9907
Test score: 0.0312398415336
Test accuracy: 0.9907
I’m on terrible internet right now and can’t double check that Keras and Theano are 100% up to date, but pip says I’m at Keras 1.0.1, Theano 0.8.1, numpy 1.11.0.
Running this script twice works as expected:
import numpy
print(numpy.random.random(3))
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 15 (5 by maintainers)
@NasenSpray thanks! That was the issue. Back on good internet, I double-checked that numpy, Theano and Keras are 100% up to date, and verified the options to fix this:
THEANO_FLAGS="optimizer_excluding=conv_dnn" python mnist_cnn.py
, which takes twice as long at around 60s/epoch instead of 25s/epoch.THEANO_FLAGS="dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic" python mnist_cnn.py
(the flag mentioned in the quoted post has been deprecated in favor of two flags). This actually runs in the same time or slightly faster on my machine, at 24s/epoch consistently.Going to close this as it’s not really a “bug” with Keras itself, so much as different behavior between different libraries. Though it might make sense to remove the
np.random.seed(1337)
at the top of many of the examples, now that it’s clear this line doesn’t guarantee anything.It seems that I solved this problem in this way: http://blog.csdn.net/qq_33039859/article/details/75452813 step1: fix the numpy random seed at the top of code step2: be sure that model.fit(shuffle=False)
@giorgiop to be clear:
np.random.seed(1337)
near the top of the fileis that correct?