keras: Why is there a problem when loading saved weights on a model

Hi everybody,

I’m trying to modify a classifier model with many tools (dropout, autoencoder, etc…) to analyse what gets the best results. Thus, I am using the save_weights and load_weights methods.

The first time I am launching my model, it works fine. However when loading the weights, the fit isn’t doing anything. The loss stagnates during the entire training.

I thought first that it was an issue of vanishing gradient since I encountered the problem first with the autoencoded dataset. But after many tweaks and tries, I feel the issue resides in the weights loading. See for yourself (this is of course done after a Runtime Restart) :

# Classifier

model = Sequential()

model.add(Dense(50, activation= 'relu', input_dim= x.shape[1]))
model.add(Dense(50, activation= 'relu'))
model.add(Dense(50, activation= 'relu'))
model.add(Dense(50, activation= 'relu'))
model.add(Dense(10, activation= 'softmax'))

model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics = ['acc'])

model.save_weights('/content/drive/My Drive/Colab Notebooks/Weights/KagTPOneStart')

First time training

loading initial weight then fit. (Yes I know the initial weights are already there at this time but I left the line here voluntarily to prove that the first time loading the weights doesn’t cause a problem):

model.load_weights('/content/drive/My Drive/Colab Notebooks/Weights/KagTPOneStart')

model.fit(x,y_train,epochs=10,batch_size=20, validation_split=0.15)

model.save_weights('/content/drive/My Drive/Colab Notebooks/Weights/KagTPOneNormal')

Results :

Train on 35700 samples, validate on 6300 samples
Epoch 1/10
35700/35700 [==============================] - 5s 128us/step - loss: 1.0875 - acc: 0.8036 - val_loss: 0.3275 - val_acc: 0.9067
Epoch 2/10
35700/35700 [==============================] - 4s 120us/step - loss: 0.2792 - acc: 0.9201 - val_loss: 0.3186 - val_acc: 0.9079
Epoch 3/10
35700/35700 [==============================] - 4s 122us/step - loss: 0.2255 - acc: 0.9357 - val_loss: 0.1918 - val_acc: 0.9444
Epoch 4/10
35700/35700 [==============================] - 4s 121us/step - loss: 0.1777 - acc: 0.9499 - val_loss: 0.1977 - val_acc: 0.9465
Epoch 5/10
35700/35700 [==============================] - 4s 121us/step - loss: 0.1530 - acc: 0.9549 - val_loss: 0.1718 - val_acc: 0.9478
Epoch 6/10
35700/35700 [==============================] - 4s 121us/step - loss: 0.1402 - acc: 0.9595 - val_loss: 0.1847 - val_acc: 0.9510
Epoch 7/10
35700/35700 [==============================] - 4s 122us/step - loss: 0.1236 - acc: 0.9637 - val_loss: 0.1675 - val_acc: 0.9546
Epoch 8/10
35700/35700 [==============================] - 4s 121us/step - loss: 0.1160 - acc: 0.9660 - val_loss: 0.1776 - val_acc: 0.9586
Epoch 9/10
35700/35700 [==============================] - 4s 120us/step - loss: 0.1109 - acc: 0.9683 - val_loss: 0.1928 - val_acc: 0.9492
Epoch 10/10
35700/35700 [==============================] - 4s 120us/step - loss: 0.1040 - acc: 0.9701 - val_loss: 0.1749 - val_acc: 0.9570
WARNING:tensorflow:This model was compiled with a Keras optimizer (<tensorflow.python.keras.optimizers.Adam object at 0x7fb76ca35080>) but is being saved in TensorFlow format with `save_weights`. The model's weights will be saved, but unlike with TensorFlow optimizers in the TensorFlow format the optimizer's state will not be saved.

Consider using a TensorFlow optimizer from `tf.train`.

Second time training

loading initial weights then fit :

model.load_weights('/content/drive/My Drive/Colab Notebooks/Weights/KagTPOneStart')

model.fit(x,y_train,epochs=10,batch_size=20, validation_split=0.15)

model.save_weights('/content/drive/My Drive/Colab Notebooks/Weights/KagTPOneNormal')

Results :

Train on 35700 samples, validate on 6300 samples
Epoch 1/10
35700/35700 [==============================] - 4s 121us/step - loss: 14.4847 - acc: 0.1011 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 2/10
35700/35700 [==============================] - 4s 122us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 3/10
35700/35700 [==============================] - 4s 120us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 4/10
35700/35700 [==============================] - 4s 121us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 5/10
35700/35700 [==============================] - 4s 121us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 6/10
35700/35700 [==============================] - 4s 121us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 7/10
35700/35700 [==============================] - 4s 122us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 8/10
35700/35700 [==============================] - 4s 121us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 9/10
35700/35700 [==============================] - 4s 122us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 10/10
35700/35700 [==============================] - 5s 130us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
WARNING:tensorflow:This model was compiled with a Keras optimizer (<tensorflow.python.keras.optimizers.Adam object at 0x7fb76ca35080>) but is being saved in TensorFlow format with `save_weights`. The model's weights will be saved, but unlike with TensorFlow optimizers in the TensorFlow format the optimizer's state will not be saved.

Consider using a TensorFlow optimizer from `tf.train`.

Thanks in advance for your help 😃

PS : Here’s the data for reference, but I really don’t think this is the problem. This is a MNIST-like dataset provided by google on kaggle. (I believe it is exactly MNIST but not all samples) :

import pandas as pd 
df=pd.read_csv('/content/drive/My Drive/Colab Notebooks/IA/Kaggle TP1/train.csv')
data = df.values
data.shape
y = data[:,0]
y_train =  np_utils.to_categorical(y, 10)
x = data[:,1:]

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 22 (4 by maintainers)

Most upvoted comments

I think I’m having the same issue. I had some old code and models from last June, using TF1. I updated the code to work in TF2 (mostly just chanigng tf.python.keras to tf.keras), and it can load OLD models written with tf1 (and they work perfectly well), and it seems to train ok in TF2 (loss goes down sensibly), BUT if you save the model now and try to load it later, it’s not working properly any more (models written with TF2). All the predictions are the same no matter what input you provide, it’s as if the weights were all zeros (but not the biases). The results changes every time I train and save and load and test, but it’s the same for all input images.

This is not a standalone Google colaboratory issue. I am facing the same issue outside of google colab and the loaded model seems to have re-initialized weights after saving.

I am actually surprised that this has remained open for so long. Has none of you here deployed a model in production? if yes, how did you save the model/weights in order for you to get consistent predictions (as during training)?

P.S. I am using TF version 2.0.0

When I made my experiments, I was forced to use Google Colab due to a broken computer. As I continued the training, the results got quite funky in plenty of cases. Not only in the negative way, but right now I also have a “too good to be true” result. I have rerun plenty of those experiments now on my computer and I am not able to reproduce the weirdness at all and it works exactly as expected!

It seems very likely that the issue is caused by something on Google Colab and is not a general problem in TensorFlow (2.0) or Keras.

@Atralb , did you consider getting rid of the WARNING, regarding the incompatibility of the model formats, being compiled as Keras optimizer format and saved as TensorFlow format? Also let us know if @colt18 suggestion was useful. Thanks.

I’m not sure but it may be related to this: https://stackoverflow.com/questions/47266383/save-and-load-weights-in-keras Saving weights and saving the whole training process at current epoch are different things. Can you try to model.save(filepath) instead of model.save_weights('my_model_weights.h5'). If you are resuming the training you need to save all the info.