keras: Trainable = False isn't freezing weights
I have a script that previously would freeze pre-trained weights from the ResNet50 model and train the new layers I placed on top of the base model. Now, the model summary is reporting that all weights are trainable (counted in the total params).
Keras at 4c1353c188b3412b22d9f65042973e56a05433fe Theano at ae36be011c98b1a2f30753162db01f6588ff8be3
# Example code:
base_model = ResNet50(weights='imagenet', include_top=False, input_tensor=input_tensor)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)
# random projection idea
x = Dense(256, trainable=False)(x)
x = BatchNormalization(axis=bn_axis)(x)
x = LeakyReLU(leaky_relu_slope)(x)
x = Dropout(0.5)(x)
# regular dense layer
x = Dense(128, W_constraint=maxnorm(4))(x)
x = LeakyReLU(leaky_relu_slope)(x)
x = Dropout(0.5)(x)
x = Dense(nb_classes, activation='softmax')(x)
model = Model(input=input_tensor, output=x)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional layers
for layer in base_model.layers:
layer.trainable = False
nadam_custom = Nadam(lr=0.0001, clipnorm=1., clipvalue=0.25)
model.compile(loss='categorical_crossentropy',
optimizer=nadam_custom,
metrics=['accuracy', 'top_k_categorical_accuracy'])
model.summary()
# reports all layers having parameters whereas previously the ResNet layers were zeros:
# Total params: 24170041
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 38 (12 by maintainers)
@pavanramkumar Maybe this will clarify things: Let D,G and GAN be the discriminator the generator and the stacked model in your example. Also, let us assume the following pseudo-code for constructing and compiling the models:
If you set D.trainable = False before compiling the model D and then try to fit D you shall observe that D is indeed “frozen”. If you set D.trainable = False after compiling D and then try to fit D, it will actually start learning things. However, it will remain frozen during the training process of the GAN. And this is the behaviour you might be after. In both cases the summary() function will always tell you that the you do not have non-trainable parameters, at least in my case under keras version 1.2.1 .
This is, of course, easy to disprove:
-> loss does not change, because the model is not trainable.
@fchollet right. Here is the easiest example to reproduce the bug (running on keras 1.2):
The output from model1 should be equal before and after model2 train, because of model1.trainable=False before building model2. You can clearly see the output is changed - model1 is learning.
that’s a nasty, NASTY bug that I encounter too. As you I’ve implemented a GAN on keras, and I found the problem easily reproducible by @gibipara92 example: the trainable flag does not work at all with our functional API models! At the best of my knowledge, EVERY implementation of GAN in keras is bugged: https://github.com/tdeboissiere/DeepLearningImplementations/ https://github.com/phreeza/keras-GAN https://github.com/osh/KerasGAN and many more, all of them relies on freezing one piece of the network with trainable=false.
In this example your
model1.trainable = False
has no effect on the rest of the code, because you never usemodel1
past that point. Instead, you use its underlying layers, which are still trainable.Essentially: when calling
model1.output
, you are retrieving they
tensor fromy = keras.layers.Dense(5)(x)
, and adding stuff on top. Your code is equivalent to:Hope that clears things up.
@fchollet we tested this again and made sure that the error is in reporting alone, i.e.
model.summary()
. as @farizrahman4u said earlier, it would be useful to modify this function to say which parameters are trainable and which are not.expanding on what @Js-Mim said as a general principle,
model1
andmodel2
:hope this is useful
Was having trouble because weights were not freezing. This does not freeze weights:
But this does:
Is this expected behavior?
The batchnorm layer updates its mean and variance statistics at training time. It is not learning, as it contains no trainable parameters: no gradients get backpropagated to its weights (gamma and beta).
On 19 April 2017 at 06:41, Diogo Luvizon notifications@github.com wrote:
I see. The obvious solution should be to put each model1 layer trainable = False. And in fact it works on our example: putting every model1 layer trainable to False is freezing model1 layer when training model2. So, every GAN implementation doing:
is actually wrong. All the ones I reported use this method. Anyway, the problem is stil not fixed. Putting network.layers=False in my GAN did not solved the problem. The point is that our GAN model, with GEN, DISC and DCGAN models is more complicated than this example, and I don’t seem to find a config where I can:
I’ll build a simple snippet of code reproducing GAN behaviour, because @fchollet as you can imagine we cannot run 3 model.compile at every batch iteration.
Just a follow up with a quick ‘n’ dirty solution for adversarial settings. The trick is to define a non-trainable clone of the discriminator and copy the weights from the discriminator to the clone before training the generator to fool the discriminator.
A solution sketch is as follows:
Hope this helps.
Cheers, DG
I think this is the same problem like calling
model.trainable = False
directly. The flag will only be applied to the model (on the highest level of abstraction). If you look into the graphs the single layers and in there the weights etc. still will showtrainable = True
(so you really have to call it on everything in your whole graph that has atrainable
flag “manually” in order to set every flag toFalse
). If you encapsulate layers somehow in complex architectures this behavior is really annoying.Imho there should be something like the
--recursive
option for many commands in the terminal. This way one would be able to callmodel.trainable = False
with the expected behavior: the model and all of its children layers would be recursively set their trainable flags toFalse
.We need to have ‘setTrainable’ method per layer (and not per compiled model) that is cheap to call.
Hi all, This topic might be out of the date, but i recently experienced a chaos with GANs and especially when freezing layers.
Thanks to @Js-Mim , you made my life easier, however I found another problem for which I am posting my endeavors about.
The problem is when combining two models that are not trainable:
I would expect it to work, but it doesn’t (Keras 2.0.6). Only if setting
then it works.
less elegant solution to this problem is adding non-trainable layers (though you must specify them separately with l.trainable = False, that was also a bit unexpected for me.
Something is definitely going on here… though my ignorance might be the main reason of my adventures. Could someone explain me who is who here? bug/misconception/ignorance?
@fchollet I finally managed to easily reproduce the bug that was driving me crazy in the last days. The trainable property works like a charm on simple GAN models, but is buggy in one case. Task: load a pretrained network using it as feature extractor(freezing its layers), build a model that uses the pretrained network adding a FC layer, train the model, test if the pretrained network changed:
the two outputs, before and after model training, are equals as they should be. Let repeat the same with ResNet50 or InceptionV3:
The outputs are not equal anymore: here is the bug. The bug happens with ResNet50 and InceptionV3 and does not happen on VGG16 and VGG19, I imagine is something related to the batchnorm layer, maybe is not counted as layer and so is not catched by
for l in net.layers:
? Edit: i see batchnormalization layers in net.layers, so it really seems a bug.Are you sure that some nasty connections or dropout(s) are not messing the output of the loss function?