keras: EarlyStopping callback won't restore best weights unless training stops early

The EarlyStopping callback will, if the restore_best_weights option is True, restore the best weights if and only if it requests the stopping itself, not if stopping is requested by another callback or the training loop has simply run for a given number of epochs. The call to Model.set_weights is inside the on_epoch_end method guarded by a comparison of wait and patience[1].

If I read the code correctly, then in a scenario where the best weights happened fewer than patience epochs before the last epoch, the model will keep the weights from the last epoch.

If this is intentional, I think it should be documented. If not, I think it would be an easy fix to move the weight restoring logic to the on_train_end method.

[1] https://github.com/keras-team/keras/blob/f0eb8d538c82798944346b4b2df917a06bf5e9d4/keras/callbacks.py#L823-L830

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 29
  • Comments: 17

Commits related to this issue

Most upvoted comments

Guys, this is super important. It happened today to us for the first time and we just found it because while the model had been evaluated with a good loss (compared to other experiments), the final results were terrible. The model reached the maximum number of epochs and thus the callback did not restored the best weights from an earlier epoch, as it was meant to do.

In my opinion this issue should be tagged as a bug, as it is still out there in the final version and master branch. It needs to be resolved as suggested by @Stigjb by moving the restoration of the weights in on_train_end, otherwise many people may run and report wrong number or stop believing in their fresh ideas!

Sorry to come back to this, but I see this issue was closed without a milestone. Is this fixed? If so, in which version? Thanks.

So what about supporting the ability to resume training? I say we add another flag, maybe weights_restored_without_early_stop, that defaults to True but if explicitly set to False will result in the current behavior.

This would indeed be a very nice addition. I was a little confused about why the weights weren’t being restored until I read this thread. It made me think there was a bug in either TF or my code.

I’ve been doing this as a workaround… Its a little inelegant, but seems to work for my use-cases.

  1. Use the ModelCheckpoint callback with save_best_only=True.
  2. After training, load the best model using tf.keras.models.load_model.

I think the best solution to this is to also add a restore_best_weights option to Model.fit that you can have this behavior on early termination and regular termination.

As pointed out in https://github.com/tensorflow/tensorflow/issues/35634#issuecomment-612644409, the user may want to continue training after normal termination, so EarlyStoppingCallback always overriding your weights would present a challenge. That’s why I think Model.fit also needs this option.

@meanmikeyk If EarlyStopping doesn’t trigger it simply means that the training didn’t proceed for patience epochs after the best scoring epoch.

There are many reasons why this could happen: Perhaps you are training the network for the first time and don’t know how many epochs it really needs; perhaps some random aspect of the training causes the best epoch to arrive later than expected; or perhaps you simply have a finite amount of patience and/or computing resources and cannot train for more than some fixed time.

The fact that EarlyStopping does not save the best parameters in any case, or offer any option to do so, catches many people by surprise. This ruins long and potentially expensive training runs.

I don’t believe it belongs in Model.fit, for the same reasons the EarlyStopping class exists in the first place. The Keras team could have added a patience option to Model.fit but chose not to in order to avoid further muddying that method. Let’s avoid adding a 'restore_best_weights` option for the same reason.

I’m strongly of the opinion that EarlyStopping should restore the best weights if restore_best_weights is True, regardless of why training stopped. Failing to restore the weights is not the behavior of least surprise, which by some definitions makes it a bug. At best, it is ill-advised.

So what about supporting the ability to resume training? I say we add another flag, maybe weights_restored_without_early_stop, that defaults to True but if explicitly set to False will result in the current behavior.

As I already said @jjbuchanan in worst case scenario, people will not review the training process and will just evaluate and log results. There is a great change that people will report wrong and misleading results, which may mislead some people or a big community and even worse good ideas may get ruined because they had terrible performance (based on the last terrible parameters) and get withdrawn…