keras: Setting dropout rate via layer.rate doesn't work

Hello there,

suppose you’ve defined a Keras Model with the functional API, and you want to change the dropout rate of the Dropout layers after you’ve instantiated the Model. How do you do this?

I’ve tried to do the following:

from keras.layers import Dropout
for layer in model.layers:
    if isinstance(layer, Dropout):
        layer.rate = 0.0
        print layer.get_config()

Based on the updated config of the Dropout layers, this should work:

{'noise_shape': None, 'rate': 0.2, 'trainable': True, 'seed': None, 'name': 'dropout_1'} -> {'noise_shape': None, 'rate': 0.0, 'trainable': True, 'seed': None, 'name': 'dropout_1'}

However, I can tell you that this does not work: during training, the old dropout values are still used. I’ve also tried to compile the model again after the layer loop (model.compile()) or even make a new model (model = Model(inputs=model.input, outputs=model.output)), but the problem still persists.

This issue can be easily tested with a VGG-like CNN with dropout layers and a small data sample (e.g. 100 images): just try to overfit the data. If you instantiate the net with a dropout rate of e.g. 0.2, the model will have a hard time to overfit the small data sample. Using the above code snippet, which should set the dropout rate to 0, will not change anything. However, if you directly instantiate the net with a dropout rate of 0.0, it will immediately overfit on the data sample.

Thus, it can be figured out that layer.rate changes the Dropout rate in the layer config, but somehow still the old dropout rate is used during training.

I’ve also tried to take a look into the Dropout layer sources. The only thing I can think of is that maybe the __init__ of the Dropout layers is not called again after changing the rate, such that the old dropout rate is used in call:

    def __init__(self, rate, noise_shape=None, seed=None, **kwargs):
        super(Dropout, self).__init__(**kwargs)
        self.rate = min(1., max(0., rate))
        self.noise_shape = noise_shape
        self.seed = seed

But this is just a guess. I’m using Keras 2.1.2 with tensorflow backend.

Does anyone have an idea? Thanks a lot!

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 2
Comments: 16

Most upvoted comments

Fixed formatting.

civilinformer on Jun 12, 2018

Here is a sample code which checks if the rate is changed

import numpy as np
from keras import backend as K
from keras.layers import Dropout

dummy_input = np.ones((5,5))

K.set_learning_phase(1)
dropout_test = Dropout(0.3)
out_1 = dropout_test.call(dummy_input)
K.eval(out_1)

dropout_test.rate = 0.5
out_2 = dropout_test.call(dummy_input)
K.eval(out_2)

You can see that the dropout rate is different from the outputs.

mpariente on Dec 20, 2017

Thanks for your experiments. They were very useful. I believe the issue is that the variable that you are trying to change in the Dropout Layer is not a tensorflow variable, so it never gets updated in the backend. I did some similar experiments with a slightly modified Dropout layer and associated callback and it seems to work:

class MyDropout(Layer):
    @interfaces.legacy_dropout_support
    def __init__(self, rate, noise_shape=None, seed=None, **kwargs):
        super(MyDropout, self).__init__(**kwargs)
        self.rate = K.variable(min(1., max(0., rate)))
        self.noise_shape = noise_shape
        self.seed = seed
        self.supports_masking = True

    def _get_noise_shape(self, inputs):
        if self.noise_shape is None:
            return self.noise_shape

        symbolic_shape = K.shape(inputs)
        noise_shape = [symbolic_shape[axis] if shape is None else shape
                       for axis, shape in enumerate(self.noise_shape)]
        return tuple(noise_shape)

    def call(self, inputs, training=None):
        if 0. < K.get_value(self.rate) < 1.:
            noise_shape = self._get_noise_shape(inputs)

            def dropped_inputs():
                return K.dropout(inputs, self.rate, noise_shape,
                                 seed=self.seed)
            return K.in_train_phase(dropped_inputs, inputs,
                                    training=training)
        return inputs

    def get_config(self):
        config = {'rate': K.get_value(self.rate),
                  'noise_shape': self.noise_shape,
                  'seed': self.seed}
        base_config = super(MyDropout, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

    def compute_output_shape(self, input_shape):
        return input_shape

class DropoutReducer(Callback):
    def __init__(self, patience=0, reduce_rate=0.5, verbose=1, 
                monitor='val_loss', **kwargs):
        super(DropoutReducer, self).__init__(**kwargs)
        self.patience = patience
        self.wait = 0
        self.best_score = -1.
        self.reduce_rate = reduce_rate
        self.verbose = verbose
        self.monitor = monitor
        self.TAG = "DROPOUT REDUCER: "
        self.callno = -1
        self.dropout_rate = -1

    def on_epoch_end(self, epoch, logs={}):
        current_score = logs.get(self.monitor)
        if self.verbose == 2:
            print(self.TAG + "---Current score: {:.4f} vs best score is: 
                     {:.4f}".format(current_score,self.best_score))
        self.callno += 1
        if self.callno == 0:
            self.best_score = current_score
        elif current_score < self.best_score:
            self.best_score = current_score
            self.wait = 0
        else:
            if self.wait >= self.patience:
                if self.verbose:
                    print(self.TAG + '---Reducing Dropout Rate...')
                found_layers = 0
                for layer in self.model.layers:
                    if isinstance(layer,Model):
                        for lay in layer.layers:
                            if self.verbose == 2:
                                print(lay)
                            if isinstance(lay, MyDropout):
                                self.dropout_rate = self.reduce_rate * K.get_value(lay.rate)
                                K.set_value(lay.rate, self.dropout_rate )
                                found_layers = found_layers + 1 
                if self.verbose:
                    print(self.TAG+ 'Found {} Dropout layers and reduced dropout rate to 
                            {}.'.format(found_layers,self.dropout_rate))
                self.wait = 0
            else:
                self.wait += 1

civilinformer on Jun 12, 2018

I tried it with epsilon and it works fine for K.eval, but still not while training. So it looks like the dropout rate just remains unchanged for the training, no matter what rate you set.

About the overfitting: Suppose you have a VGG-like CNN with a small data sample of e.g. 1000 images with batchsize 32. The dropout is applied after every convolutional block and you have a binary classification problem (two classes, loss for random guessing is ~ 0.693).

Now you can try out three different things:

Instantiate the model with a dropout rate of e.g. 0.2. Then, it should be at least a little bit difficult for the network to overfit on the small data sample. This can be seen in the training log for the first 4 epochs:

31/31 [==============================] - 17s 540ms/step - loss: 0.8700 - acc: 0.5192
Test sample results: [0.69226435115260465, 0.52721774193548387] (['loss', 'acc'])
31/31 [==============================] - 10s 314ms/step - loss: 0.7775 - acc: 0.4960
Test sample results: [0.69207690031297742, 0.52923387096774188] (['loss', 'acc'])
31/31 [==============================] - 10s 315ms/step - loss: 0.7352 - acc: 0.5121
Test sample results: [0.6918414677343061, 0.52923387096774188] (['loss', 'acc'])
31/31 [==============================] - 10s 315ms/step - loss: 0.7161 - acc: 0.4950
Test sample results: [0.69223312024147277, 0.52923387096774188] (['loss', 'acc'])

Instantiate the model with Dropout(0) layers. The network is now able to overfit easily:

31/31 [==============================] - 15s 496ms/step - loss: 0.7272 - acc: 0.5111
Test sample results: [0.69330322742462158, 0.501008064516129] (['loss', 'acc'])
31/31 [==============================] - 9s 282ms/step - loss: 0.6709 - acc: 0.5746
Test sample results: [0.69320519508854039, 0.50201612903225812] (['loss', 'acc'])
31/31 [==============================] - 9s 285ms/step - loss: 0.6246 - acc: 0.6754
Test sample results: [0.69458910149912678, 0.48185483870967744] (['loss', 'acc'])
31/31 [==============================] - 9s 284ms/step - loss: 0.5719 - acc: 0.7651
Test sample results: [0.6960476886841559, 0.4848790322580645] (['loss', 'acc'])

Instantiate the model with a dropout rate of 0.2. After that, change the rate of each dropout layer to 0 via layer.rate = 0. The config for each dropout layer says that it’s successful: e.g. {'noise_shape': None, 'rate': 0.0, 'trainable': True, 'seed': None, 'name': 'dropout_1'}

Now, the network should be able to overfit again, but in practice it can’t:

31/31 [==============================] - 17s 542ms/step - loss: 0.8120 - acc: 0.4859
Test sample results: [0.69336761390009238, 0.48790322580645162] (['loss', 'acc'])
31/31 [==============================] - 10s 315ms/step - loss: 0.7656 - acc: 0.5010
Test sample results: [0.69374309432122017, 0.47681451612903225] (['loss', 'acc'])
31/31 [==============================] - 10s 315ms/step - loss: 0.7337 - acc: 0.5282
Test sample results: [0.69283238341731412, 0.51209677419354838] (['loss', 'acc'])
31/31 [==============================] - 10s 316ms/step - loss: 0.7366 - acc: 0.5060
Test sample results: [0.69180465898206156, 0.52923387096774188] (['loss', 'acc'])

Actually, if you look at the starting loss in the first epoch, you can also see that 3) does not work: the initial loss of a network without Dropout should be lower than the initial loss for a network with Dropout (if you compare with 2.).

Anyways, so I would’ve thought that I’m just using the wrong syntax or something similar, but looks like this isn’t the case.

ViaFerrata on Dec 20, 2017