keras: Setting dropout rate via layer.rate doesn't work
Hello there,
suppose you’ve defined a Keras Model with the functional API, and you want to change the dropout rate of the Dropout layers after you’ve instantiated the Model. How do you do this?
I’ve tried to do the following:
from keras.layers import Dropout
for layer in model.layers:
if isinstance(layer, Dropout):
layer.rate = 0.0
print layer.get_config()
Based on the updated config
of the Dropout layers, this should work:
{'noise_shape': None, 'rate': 0.2, 'trainable': True, 'seed': None, 'name': 'dropout_1'} -> {'noise_shape': None, 'rate': 0.0, 'trainable': True, 'seed': None, 'name': 'dropout_1'}
However, I can tell you that this does not work: during training, the old dropout values are still used.
I’ve also tried to compile the model again after the layer loop (model.compile()
) or even make a new model (model = Model(inputs=model.input, outputs=model.output)
), but the problem still persists.
This issue can be easily tested with a VGG-like CNN with dropout layers and a small data sample (e.g. 100 images): just try to overfit the data. If you instantiate the net with a dropout rate of e.g. 0.2, the model will have a hard time to overfit the small data sample. Using the above code snippet, which should set the dropout rate to 0, will not change anything. However, if you directly instantiate the net with a dropout rate of 0.0, it will immediately overfit on the data sample.
Thus, it can be figured out that layer.rate
changes the Dropout rate in the layer config, but somehow still the old dropout rate is used during training.
I’ve also tried to take a look into the Dropout layer sources.
The only thing I can think of is that maybe the __init__
of the Dropout layers is not called again after changing the rate, such that the old dropout rate is used in call
:
def __init__(self, rate, noise_shape=None, seed=None, **kwargs):
super(Dropout, self).__init__(**kwargs)
self.rate = min(1., max(0., rate))
self.noise_shape = noise_shape
self.seed = seed
But this is just a guess. I’m using Keras 2.1.2 with tensorflow backend.
Does anyone have an idea? Thanks a lot!
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 2
- Comments: 16
Fixed formatting.
Here is a sample code which checks if the rate is changed
You can see that the dropout rate is different from the outputs.
Thanks for your experiments. They were very useful. I believe the issue is that the variable that you are trying to change in the Dropout Layer is not a tensorflow variable, so it never gets updated in the backend. I did some similar experiments with a slightly modified Dropout layer and associated callback and it seems to work:
I tried it with epsilon and it works fine for
K.eval
, but still not while training. So it looks like the dropout rate just remains unchanged for the training, no matter what rate you set.About the overfitting: Suppose you have a VGG-like CNN with a small data sample of e.g. 1000 images with batchsize 32. The dropout is applied after every convolutional block and you have a binary classification problem (two classes, loss for random guessing is ~ 0.693).
Now you can try out three different things:
Instantiate the model with
Dropout(0)
layers. The network is now able to overfit easily:Instantiate the model with a dropout rate of 0.2. After that, change the rate of each dropout layer to 0 via
layer.rate = 0
. The config for each dropout layer says that it’s successful: e.g.{'noise_shape': None, 'rate': 0.0, 'trainable': True, 'seed': None, 'name': 'dropout_1'}
Now, the network should be able to overfit again, but in practice it can’t:
Actually, if you look at the starting loss in the first epoch, you can also see that 3) does not work: the initial loss of a network without Dropout should be lower than the initial loss for a network with Dropout (if you compare with 2.).
Anyways, so I would’ve thought that I’m just using the wrong syntax or something similar, but looks like this isn’t the case.