tensorflow: Both 'mean' and 'variance' must be None when is_training is True and exponential_avg_factor == 1.0

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
TensorFlow installed from (source or binary): - TensorFlow version (use command below): 2.2.0-dev20200411
Python version: 3.6.3
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: 10.1- GPU model and memory:

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior When instantiating a batch norm layer like this: tf.keras.layers.BatchNormalization(momentum=0.0, center=True, scale=False, name=‘bn1’) I get the error: Both ‘mean’ and ‘variance’ must be None when is_training is True and exponential_avg_factor == 1.0 Describe the expected behavior It is not always the expected behavior. Consider meta-learning for example. We are going to see just one batch of training data and we want to adapt all means and variances to this batch, this means the momentum should be zero. Then after applying a few training iterations, we evaluate on the same batch norm layer with training=False and that also should work fine. Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

import tensorflow as tf
import numpy as np

inp = tf.keras.layers.Input(shape=(84, 84, 3))
dense = tf.keras.layers.Conv2D(10, 3, activation=None)(inp)
bn = tf.keras.layers.BatchNormalization(momentum=0.0, center=True, scale=False, name='bn1')(dense)
rel = tf.keras.layers.ReLU()(bn)
flat = tf.keras.layers.Flatten()(rel)
out = tf.keras.layers.Dense(1, )(flat)
model = tf.keras.models.Model(inputs=inp, outputs=out)

model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam())
model.fit(x=np.random.uniform(size=(4, 84, 84, 3)), y=np.random.uniform(size=(4, 1)), epochs=1)
model.evaluate(x=np.random.uniform(size=(3, 84, 84, 3)), y=np.random.uniform(size=(3, 1)))
model.predict(x=np.random.uniform(size=(1, 84, 84, 3)))

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 15 (6 by maintainers)

Most upvoted comments

Hi @siavash-khodadadeh ,by whitening I mean using the mean as 0 and sd/variance as 1 before passing an input vector xi {1,2…m} through an activation unit .I think according to the implementation there is mention of a batch normalised vector value X = x-E[x]/sqrt(var(x) + epsilon) where E[x] is the expectation and epsilon is just added to prevent division by 0. There are other parameters gamma and beta which govern the gradient step (scale and shift rule).During training as false there is a moving average over the input vector of previous layers which is “whitened” by applying the above metric and then passed into the activation units - relu sigmoid etc of the current layer. Yes you are correct this should not be occurring if is_training is true because in that case there is no need of such moving statistic over previous layers,input batch of current state is whitened and used.If mean and variance are none in this case(training=true) then it is not possible to determine the normalised value which is to be passed into the activation unit.

abhilash1910 on Apr 13, 2020