keras: Trainable = False isn't freezing weights

I have a script that previously would freeze pre-trained weights from the ResNet50 model and train the new layers I placed on top of the base model. Now, the model summary is reporting that all weights are trainable (counted in the total params).

Keras at 4c1353c188b3412b22d9f65042973e56a05433fe Theano at ae36be011c98b1a2f30753162db01f6588ff8be3

# Example code:

base_model = ResNet50(weights='imagenet', include_top=False, input_tensor=input_tensor)

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)

# random projection idea
x = Dense(256, trainable=False)(x)
x = BatchNormalization(axis=bn_axis)(x)
x = LeakyReLU(leaky_relu_slope)(x)
x = Dropout(0.5)(x)

# regular dense layer
x = Dense(128, W_constraint=maxnorm(4))(x)
x = LeakyReLU(leaky_relu_slope)(x)
x = Dropout(0.5)(x)

x = Dense(nb_classes, activation='softmax')(x)

model = Model(input=input_tensor, output=x)

# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional layers
for layer in base_model.layers:
    layer.trainable = False

nadam_custom = Nadam(lr=0.0001, clipnorm=1., clipvalue=0.25)

model.compile(loss='categorical_crossentropy',
              optimizer=nadam_custom,
              metrics=['accuracy', 'top_k_categorical_accuracy'])

model.summary()

# reports all layers having parameters whereas previously the ResNet layers were zeros:
# Total params: 24170041

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 38 (12 by maintainers)

Most upvoted comments

@pavanramkumar Maybe this will clarify things: Let D,G and GAN be the discriminator the generator and the stacked model in your example. Also, let us assume the following pseudo-code for constructing and compiling the models:

Construct D 1a) Compile D
Construct G
Set D.trainable = False
Stack G and D, to construct GAN 4a) Compile GAN

If you set D.trainable = False before compiling the model D and then try to fit D you shall observe that D is indeed “frozen”. If you set D.trainable = False after compiling D and then try to fit D, it will actually start learning things. However, it will remain frozen during the training process of the GAN. And this is the behaviour you might be after. In both cases the summary() function will always tell you that the you do not have non-trainable parameters, at least in my case under keras version 1.2.1 .

+75

Js-Mim on Mar 13, 2017

This is, of course, easy to disprove:

import numpy as np
import keras

x = keras.layers.Input(shape=(3,))
y = keras.layers.Dense(5)(x)

model = keras.models.Model(x, y)
model.trainable = False
model.compile(optimizer='rmsprop', loss='mse')

x = np.random.random((10, 3))
y = np.random.random((10, 5))
model.fit(x, y, epochs=10)

-> loss does not change, because the model is not trainable.

+13

fchollet on Apr 11, 2017

@fchollet right. Here is the easiest example to reproduce the bug (running on keras 1.2):

import numpy as np
import keras

x = keras.layers.Input(shape=(3,))
y = keras.layers.Dense(5)(x)

model1 = keras.models.Model(x, y)
model1.trainable = True
model1.compile(optimizer='rmsprop', loss='mse')

data_x = np.random.random((10, 3))
data_y = np.random.random((10, 5))

model1.fit(data_x, data_y, nb_epoch=2)
out=model1.predict(data_x)
print out

model1.trainable = False 
z = keras.layers.Dense(5)(model1.output) 
model2 = keras.models.Model(x, z) 
model2.compile(optimizer='rmsprop', loss='mse') 

data_z = np.ones((10, 3)) 
data_w = np.ones((10, 5)) 

model2.fit(data_z, data_w, nb_epoch=2) 

out=model1.predict(data_x)
print out

The output from model1 should be equal before and after model2 train, because of model1.trainable=False before building model2. You can clearly see the output is changed - model1 is learning.

engharat on Apr 11, 2017

that’s a nasty, NASTY bug that I encounter too. As you I’ve implemented a GAN on keras, and I found the problem easily reproducible by @gibipara92 example: the trainable flag does not work at all with our functional API models! At the best of my knowledge, EVERY implementation of GAN in keras is bugged: https://github.com/tdeboissiere/DeepLearningImplementations/ https://github.com/phreeza/keras-GAN https://github.com/osh/KerasGAN and many more, all of them relies on freezing one piece of the network with trainable=false.

engharat on Apr 11, 2017

In this example your model1.trainable = False has no effect on the rest of the code, because you never use model1 past that point. Instead, you use its underlying layers, which are still trainable.

Essentially: when calling model1.output, you are retrieving the y tensor from y = keras.layers.Dense(5)(x), and adding stuff on top. Your code is equivalent to:

import numpy as np
import keras

x = keras.layers.Input(shape=(3,))
y = keras.layers.Dense(5)(x)

z = keras.layers.Dense(5)(y)
model2 = keras.models.Model(x, z)
model2.compile(optimizer='rmsprop', loss='mse')

Hope that clears things up.

fchollet on Apr 11, 2017

@fchollet we tested this again and made sure that the error is in reporting alone, i.e. model.summary(). as @farizrahman4u said earlier, it would be useful to modify this function to say which parameters are trainable and which are not.

expanding on what @Js-Mim said as a general principle,

compile a model as soon as you define it.

model1 = Model(inputs, outputs)
model1.compile()

if you want to freeze selected layers in model1 within a training loop, just make a copy

model2 = deepcopy(model1)
layers_to_freeze = [0, 2, 4]
for l, layer in model2.layers:
  if l in layers_to_freeze:
    l.trainable = False
model2.compile()

within a training loop, iterate between fitting model1 and model2:

while(training):
  ...
  model1.fit(x, y)
  ...
  model2.fit(x, y)

this way, @engharat you don’t need to compile within a training loop.

hope this is useful

pavanramkumar on Apr 12, 2017

Was having trouble because weights were not freezing. This does not freeze weights:

   x = Dense(units=8, activation='tanh')(inputs)
   x.trainable = False
   x = Dense(units=1, activation='sigmoid')(x)
   x.trainable = False
   model = Model(input=inputs, output=x)

But this does:

   x = Dense(units=8, activation='tanh')(inputs)
   x = Dense(units=1, activation='sigmoid')(x)
   model = Model(input=inputs, output=x)
   for l in model.layers:
       l.trainable = False

Is this expected behavior?

litesaber15 on Apr 11, 2017

The batchnorm layer updates its mean and variance statistics at training time. It is not learning, as it contains no trainable parameters: no gradients get backpropagated to its weights (gamma and beta).

On 19 April 2017 at 06:41, Diogo Luvizon notifications@github.com wrote:

@fchollet https://github.com/fchollet @engharat https://github.com/engharat It seems that BatchNormalization layer is always learning. Please try the example bellow:

import numpy as np import keras

inp = keras.layers.Input(shape=(4,)) sample = np.random.normal(size=(10,4)) labels = np.random.normal(size=(10,1)) test = np.random.normal(size=(1,4))

x = keras.layers.Dense(1)(inp) model = keras.models.Model(inp, x) model.layers[1].trainable = False model.compile(loss=‘mse’, optimizer=keras.optimizers.SGD(lr=1.))

out1 = model.predict(test) model.fit(sample, labels) out2 = model.predict(test) print (out1, out2) # Until here, everything OK

Now just add a BN layer

x = model(inp) x = keras.layers.BatchNormalization()(x) model = keras.models.Model(inp, x) model.layers[1].trainable = False model.layers[2].trainable = False model.compile(loss=‘mse’, optimizer=keras.optimizers.SGD(lr=1.)) model.summary() # Trainable params: 0

out3 = model.predict(test) w1 = model.layers[2].get_weights()

model.fit(sample, labels)

out4 = model.predict(test) w2 = model.layers[2].get_weights()

print (out3, out4) print (w1) print (w2)

The inconsistency is: there are 0 trainable parameters in the model, but the BN layer changed. I hope that gives you some insights to better understand the problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/4674#issuecomment-295274265, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWb2R2kdcW706GWamQ6CYnJaB8NZgrks5rxg8DgaJpZM4LJuiB .

fchollet on Apr 19, 2017

I see. The obvious solution should be to put each model1 layer trainable = False. And in fact it works on our example: putting every model1 layer trainable to False is freezing model1 layer when training model2. So, every GAN implementation doing:

1. Construct D
    1a) Compile D
2. Construct G
3. Set D.trainable = False
4. Stack G and D, to construct GAN 
     4a) Compile GAN

is actually wrong. All the ones I reported use this method. Anyway, the problem is stil not fixed. Putting network.layers=False in my GAN did not solved the problem. The point is that our GAN model, with GEN, DISC and DCGAN models is more complicated than this example, and I don’t seem to find a config where I can:

1. train DISC by feed forward GEN-->DISC, so in this step GEN layers need to be freezed, DISC is training
2. train GEN by feed forward GEN --> DISC by freezing DISC layers and leaving GEN trainable.

I’ll build a simple snippet of code reproducing GAN behaviour, because @fchollet as you can imagine we cannot run 3 model.compile at every batch iteration.

engharat on Apr 11, 2017

Just a follow up with a quick ‘n’ dirty solution for adversarial settings. The trick is to define a non-trainable clone of the discriminator and copy the weights from the discriminator to the clone before training the generator to fool the discriminator.
A solution sketch is as follows:

# Frozen discriminator
frozen_disc = Sequential(name='frozen_discriminator')
# Same exact definition as discriminator (with same layer names)
# ...
# ...
frozen_disc.add(Dense(1, activation='sigmoid', name='discriminator_layer_n'))
# Freeze layers
for layer in frozen_disc.layers:
    layer.trainable = False
frozen_disc.trainable = False

# Frozen discriminator + Generator
discriminator_gen = Model(inputs=generator.input, outputs=frozen_disc(generator.output))

# Compile models
# ...

# Fit models
generator.fit(...)
# ...
discriminator.fit(...)
# ...
# Copy weights
for l in ['discriminator_layer_1', '...', 'discriminator_layer_n']:
    to_set = discriminator.get_layer(l).get_weights()
    discriminator_gen.get_layer('frozen_discriminator').get_layer(l).set_weights(to_set)
# ...
discriminator_gen.fit(...)

Hope this helps.

Cheers, DG

danielegrattarola on Feb 27, 2018

I think this is the same problem like calling model.trainable = False directly. The flag will only be applied to the model (on the highest level of abstraction). If you look into the graphs the single layers and in there the weights etc. still will show trainable = True (so you really have to call it on everything in your whole graph that has a trainable flag “manually” in order to set every flag to False). If you encapsulate layers somehow in complex architectures this behavior is really annoying.

Imho there should be something like the --recursive option for many commands in the terminal. This way one would be able to call model.trainable = False with the expected behavior: the model and all of its children layers would be recursively set their trainable flags to False.

Daniel451 on Feb 16, 2018

We need to have ‘setTrainable’ method per layer (and not per compiled model) that is cheap to call.

mendi80 on Sep 9, 2017

Hi all, This topic might be out of the date, but i recently experienced a chaos with GANs and especially when freezing layers.

Thanks to @Js-Mim , you made my life easier, however I found another problem for which I am posting my endeavors about.

The problem is when combining two models that are not trainable:

    g.trainable = False
    d.trainable = False

    model = Sequential()
    model.add(g)
    model.add(d)

    model.compile....

I would expect it to work, but it doesn’t (Keras 2.0.6). Only if setting

model.trainable = False
model.compile....

then it works.

less elegant solution to this problem is adding non-trainable layers (though you must specify them separately with l.trainable = False, that was also a bit unexpected for me.

    model = Sequential()
    for l in g.layers + d.layers:
        l.trainable = False
        model.add(l)

Something is definitely going on here… though my ignorance might be the main reason of my adventures. Could someone explain me who is who here? bug/misconception/ignorance?

bellatrics on Sep 7, 2017

@fchollet I finally managed to easily reproduce the bug that was driving me crazy in the last days. The trainable property works like a charm on simple GAN models, but is buggy in one case. Task: load a pretrained network using it as feature extractor(freezing its layers), build a model that uses the pretrained network adding a FC layer, train the model, test if the pretrained network changed:

from keras.applications.inception_v3 import InceptionV3
from keras.applications.resnet50 import ResNet50
from keras.applications.vgg19 import VGG19
from keras.layers import  Input
from keras.layers import Dense, Flatten
from keras.models import Model
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

input = Input(shape=(3,224,224), name="image_input")
net = VGG19(include_top=False, weights='imagenet')
net.trainable = False
for l in net.layers:
    l.trainable = False

out = net(input)
x = Flatten()(out)
x = Dense(1000, activation='softmax', name='fc1000')(x)
model = Model(input, x)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

img_path = 'mug.jpg'
img = image.load_img(img_path, target_size=(224, 224))
image = image.img_to_array(img)
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)

out1=net.predict(image)
model.train_on_batch(image,np.ones([1,1000]))
out2=net.predict(image)
#testing if the resnet outputs before and after model train are equals:
print( np.array_equal(out1,out2) )

the two outputs, before and after model training, are equals as they should be. Let repeat the same with ResNet50 or InceptionV3:

from keras.applications.inception_v3 import InceptionV3
from keras.applications.resnet50 import ResNet50
from keras.applications.vgg19 import VGG19
from keras.layers import  Input
from keras.layers import Dense, Flatten
from keras.models import Model
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

input = Input(shape=(3,224,224), name="image_input")
net = ResNet50(include_top=False, weights='imagenet')
net.trainable = False
for l in net.layers:
    l.trainable = False

out = net(input)
x = Flatten()(out)
x = Dense(1000, activation='softmax', name='fc1000')(x)
model = Model(input, x)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

img_path = 'mug.jpg'
img = image.load_img(img_path, target_size=(224, 224))
image = image.img_to_array(img)
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)

out1=net.predict(image)
model.train_on_batch(image,np.ones([1,1000]))
out2=net.predict(image)
#testing if the resnet outputs before and after model train are equals:
print( np.array_equal(out1,out2) )

The outputs are not equal anymore: here is the bug. The bug happens with ResNet50 and InceptionV3 and does not happen on VGG16 and VGG19, I imagine is something related to the batchnorm layer, maybe is not counted as layer and so is not catched by for l in net.layers: ? Edit: i see batchnormalization layers in net.layers, so it really seems a bug.

engharat on Apr 14, 2017

Are you sure that some nasty connections or dropout(s) are not messing the output of the loss function?

Js-Mim on Apr 11, 2017