keras: "Fine-tune InceptionV3/ResNet50 on a new set of classes" doesn't work, while VGG16 works (suspect BN)

The following code works as expected with vgg16 (no BN) but not with resnet50 or inception_v3 (BN). My hypothesis is that it’s due to BN. The code follows “Fine-tune InceptionV3 on a new set of classes” from https://keras.io/applications/#usage-examples-for-image-classification-models

from keras.preprocessing import image
from keras.applications import resnet50, inception_v3, vgg16
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Input
from keras.optimizers import Adam
import numpy as np

batch_size = 50
num_classes = 2

#base_model = resnet50.ResNet50
#base_model = inception_v3.InceptionV3
base_model = vgg16.VGG16

base_model = base_model(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
    layer.trainable = False

model.compile(loss='sparse_categorical_crossentropy',
              optimizer=Adam(lr=0.0001),
              metrics=['acc'])

x_train = np.random.normal(loc=127, scale=127, size=(50, 224,224,3))
y_train = np.array([0,1]*25)
x_train = resnet50.preprocess_input(x_train)

print(model.evaluate(x_train, y_train, batch_size=batch_size, verbose=0))
model.fit(x_train, y_train,
          epochs=100,
          batch_size=batch_size,
          shuffle=False,
          validation_data=(x_train, y_train))

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 25
  • Comments: 40 (11 by maintainers)

Most upvoted comments

same issue when I try ResNet50 in keras

@jksmither Sorry for the late response. I just synced my branch with the latest master and provided a patched fork of 2.1.6. Honestly I would like to see this fixed on master as maintaining a separate fork with the patch is not a viable solution on the long term. I’ll probably keep syncing it for as long as we use Keras at work but I can’t make any promises.

I am having the exact same issue. Inception does not work but VGG does fine. InceptionV3 picks the same class every time no matter what the test set is

Thanks for the input. I would advise using the fork 2.1.6 instead of the trainable_bn branch. This is because the fork is synced with the latest stable version of keras, while the trainable_bn even though it’s more fresh it’s not based on a finalized release.

Unfortunately there are no plans for a permanent fix at the moment. My PR #9965 was rejected (you can read the rational on the link) because it modifies the semantics of trainable. It’s not the first time that the BatchNormalization layer forces us to update the semantics of trainable (see version 2.1.3) but it can take a while until such a change gets enough momentum. So maybe on the future if enough people complain about it, it will reopen. Until then I’ll do my best to maintain the fork for those of you who are brave enough to mess with custom implementations.

I am having the same problem with Inception-v3, while VGG19 works. I can also confirm that when I remove

layer.trainable = False

validation accuracy starts imroving. I posted a question for a workaround on stackoverflow: https://stackoverflow.com/questions/49689122/keras-inception-v3-fine-tuning-workaraound

Maybe somebody has a suggestion

I have tried using different way to optimize(Adam, SGD, SGD with momentum and so on) when I trained the ResNet50, finetune and just freeze all the layers except fc layer, My training loss is decreasing, but val accu is increasing and then just as you say, stopping at 50% ~60%…

I have tried to ramdom sampling the data and used some data augmentation tricks, but those didn’t work.

For all of you who are affected by this, please have a look at PR #9965. This probllem is caused by the way that the Batch Normalization layer is implemented in Keras.

To understand why this happens we need to understand how the BN works. When the network is in training mode, the mini-batch statistics of BN are used for training the network; when the network is in inference mode, we use the moving mean/var learned during the training. That’s all good. The problem is how the layer behaves when it is frozen. Its side-effects are more profound when we use fine-tuning and Transfer Learning.

You see, when frozen and while in training mode the BN continues to use the mini-batch statistics for scaling the training data. This causes the unfrozen/trainable layers to adapt to the scale of the data. Unfortunately during inference mode (predictions) the network will switch to the moving mean/var. If the moving mean/var is different that the mini-batch statistics the data are scaled differently causing massive discrepancies on the accuracy. If you want more info, have a look at the PR.

I have the same problem with Resnet50.

This seems to work.

-Set the learning phase to 1 -In every batch normalization layer set Training=False

After that I get the correct accuracy.

K.set_learning_phase(1)

base_model = ResNet50(weights='imagenet', include_top=False)

x = base_model.output

x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(2, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=predictions)

for layer in base_model.layers:
    layer.trainable = False
    
    if layer.name.startswith('bn'):
        layer.call(layer.input, training=False)
for layer in base_model.layers:
        if hasattr(layer, 'moving_mean') and hasattr(layer, 'moving_variance'):
            layer.trainable = True
            K.eval(K.update(layer.moving_mean, K.zeros_like(layer.moving_mean)))
            K.eval(K.update(layer.moving_variance, K.zeros_like(layer.moving_variance)))
        else:
            layer.trainable = False

Reset the batch norm moving averages and allow them to update to the new dataset - you’ll see it transfer.

I’m writing a longer update on this matter and will open issues (easy PRs) so people can help contribute to fixing the documentation and the like.

The problem is not happening without

for layer in base_model.layers:
    layer.trainable = False

Note that this is repo master i.e with https://github.com/keras-team/keras/commit/24246ea53157f7e9faef74db3e4c06642a50509d already merged, see https://github.com/keras-team/keras/pull/8616#issuecomment-357029549

I also noticed this while training the efficientNet model, which includes BatchNormalization. I observed that it seems like freezing the BN layer, leads to bad accuracy (wrong predictions) in validation phase, while in training phase everything looks fine. When un-freezing the BN, the test accuracy recovers.

Here is an example notebook and here is some additional information:

@cesarorosco This means that you network runs always on Training mode. Even when you make predictions you use the mini-batch statistics. This is not great as your predictions will change depending on what images you pass on the batch.

VGG16 output (works as expected):

  1. both validation loss and acc before training (from evaluate()) are exactly equal to those in the first training iteration
  2. validation loss/acc correspond to training loss/acc
  3. training loss->0, acc->1, validation loss->0, acc->1.
[1.1005926132202148, 0.5]
Train on 50 samples, validate on 50 samples
Epoch 1/100
50/50 [==============================] - 1s 11ms/step - loss: 1.1006 - acc: 0.5000 - val_loss: 0.7771 - val_acc: 0.5800
Epoch 2/100
50/50 [==============================] - 0s 8ms/step - loss: 0.7771 - acc: 0.5800 - val_loss: 0.8947 - val_acc: 0.4600
Epoch 3/100
50/50 [==============================] - 0s 9ms/step - loss: 0.8947 - acc: 0.4600 - val_loss: 0.9511 - val_acc: 0.4800
Epoch 4/100
50/50 [==============================] - 0s 9ms/step - loss: 0.9511 - acc: 0.4800 - val_loss: 0.8385 - val_acc: 0.4600
Epoch 5/100
50/50 [==============================] - 0s 9ms/step - loss: 0.8385 - acc: 0.4600 - val_loss: 0.7341 - val_acc: 0.5400
Epoch 6/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7341 - acc: 0.5400 - val_loss: 0.7455 - val_acc: 0.5600
Epoch 7/100
50/50 [==============================] - 0s 8ms/step - loss: 0.7455 - acc: 0.5600 - val_loss: 0.7991 - val_acc: 0.6000
Epoch 8/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7991 - acc: 0.6000 - val_loss: 0.7902 - val_acc: 0.6000
Epoch 9/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7902 - acc: 0.6000 - val_loss: 0.7258 - val_acc: 0.5800
Epoch 10/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7258 - acc: 0.5800 - val_loss: 0.6727 - val_acc: 0.6400
[...]
Epoch 98/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2272 - acc: 1.0000 - val_loss: 0.2246 - val_acc: 1.0000
Epoch 99/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2246 - acc: 1.0000 - val_loss: 0.2221 - val_acc: 1.0000
Epoch 100/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2221 - acc: 1.0000 - val_loss: 0.2196 - val_acc: 1.0000

resnet50 output (does not work as expected):

  1. validation loss before training (from evaluate()) is nowhere near to that in the first training iteration (BN?)
  2. validation loss/acc does not correspond to training loss/acc at all (BN?)
  3. training loss->0, acc->1 very quickly (acc=1.0 starting from epoch 5), validation loss stays huge forever, acc=0.5 (random) forever.
[2.3405368328094482, 0.5]
Train on 50 samples, validate on 50 samples
Epoch 1/100
50/50 [==============================] - 1s 21ms/step - loss: 0.6806 - acc: 0.5400 - val_loss: 1.6767 - val_acc: 0.5000
Epoch 2/100
50/50 [==============================] - 0s 8ms/step - loss: 0.6061 - acc: 0.6400 - val_loss: 1.8632 - val_acc: 0.5000
Epoch 3/100
50/50 [==============================] - 0s 9ms/step - loss: 0.5088 - acc: 0.9000 - val_loss: 2.0533 - val_acc: 0.5000
Epoch 4/100
50/50 [==============================] - 0s 9ms/step - loss: 0.4437 - acc: 0.9200 - val_loss: 1.9083 - val_acc: 0.5000
Epoch 5/100
50/50 [==============================] - 0s 9ms/step - loss: 0.3799 - acc: 1.0000 - val_loss: 1.5847 - val_acc: 0.5000
Epoch 6/100
50/50 [==============================] - 0s 9ms/step - loss: 0.3222 - acc: 1.0000 - val_loss: 1.3209 - val_acc: 0.5000
Epoch 7/100
50/50 [==============================] - 0s 8ms/step - loss: 0.2816 - acc: 1.0000 - val_loss: 1.2207 - val_acc: 0.5000
Epoch 8/100
50/50 [==============================] - 0s 8ms/step - loss: 0.2439 - acc: 1.0000 - val_loss: 1.2348 - val_acc: 0.5000
Epoch 9/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2089 - acc: 1.0000 - val_loss: 1.2679 - val_acc: 0.5000
Epoch 10/100
50/50 [==============================] - 0s 9ms/step - loss: 0.1824 - acc: 1.0000 - val_loss: 1.2359 - val_acc: 0.5000
[...]
Epoch 98/100
50/50 [==============================] - 0s 8ms/step - loss: 0.0032 - acc: 1.0000 - val_loss: 2.2686 - val_acc: 0.5000
Epoch 99/100
50/50 [==============================] - 0s 8ms/step - loss: 0.0032 - acc: 1.0000 - val_loss: 2.2791 - val_acc: 0.5000
Epoch 100/100
50/50 [==============================] - 0s 8ms/step - loss: 0.0031 - acc: 1.0000 - val_loss: 2.2894 - val_acc: 0.5000

Unfreezing the BNs while keeping the subsequent Convolutions frozen can have negative effects on accuracy. I describe this in more detail here.

Did anyone find a concrete solution to this problem? I am also affected by this problem and I am working on Keras 2.1.6 and TensorFlow 1.7 to train and test my data using InceptionV3 and Resnet50.

I am very new to deep learning and any help will be appreciated.