tensorflow: Saver: Don't fail on restoring variables not present in a checkpoint.
This is a feature request.
Let’s consider a scenario where one trains multiple models and uses them in combination (like it is done in GANs).
To simplify the process of saving and restoring variables that are partly shared across the models (Pretraining Model, Training Model, Evaluation Model, Infer Model) one could instantiate the whole graph, containing all operations and variables, and save it.
Then in order to do pre-training only the subset of graph elements that is required for pretraining is used.
This results in the overhead of having to build the whole model (which might consist of multiple sub models) the first time the model is run even though only a smaller part (e.g. Pretraining Model) is required.
Another issue arises when using different optimizers in different training stages (e.g. SGD first then Adam). As Adam creates additional variables Adam has to be instantiated during the first training stage so restoring from a checkpoint does not fail when restoring with Adam instead of SGD.
This restriction of having to build everything despite parts not being required results in more complicated code. If it would be possible to silently fail when a variable is not found in a checkpoint, so it can be initialized with tf.global_variables_initializer() instead, would allow better structuring of code.
I have looked through all current issues regarding this problem and I have found a couple that face a similar problem and where a QuietlyFailRestoringSaver could solve this problem:
https://github.com/tensorflow/tensorflow/issues/12032
https://github.com/tensorflow/tensorflow/issues/16781
I might consider building this if there is enough support for it. I am open for feedback.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 30 (28 by maintainers)
Small example:
I would expect that the second graph (
Finished Model with Adam) works. But it throwsAttempting to use uninitialized value beta2_power. I expected the.initialize_or_restoreto initialize theAdamvariables but that does not happen.They are printed by
print(checkpointable_utils._serialize_object_graph(checkpoint))though.If you think that’s a bug I can open up a new issue to track it. I hope I am not using the API incorrectly.