tensorflow: Following instructions in batch_normalization docs produces an exception

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 17.10
TensorFlow installed from (source or binary): Binary (pip install tensorflow)
TensorFlow version (use command below): v1.4.0-rc1-11-g130a514 1.4.0
Python version: 3.6.3
CUDA/cuDNN version: N/A
GPU model and memory: N/A
Exact command to reproduce: (see the “Source code” section)

Describe the problem

I was following the instructions for updating moving_mean and moving_variance by using the code snippet provided in batch_normalization documentation and it resulted in an “tensorflow.python.framework.errors_impl.InvalidArgumentError: Retval[0] does not have value” exception.

Source code / logs

Full instructions to reproduce:

git clone -b control-dependencies-exc https://github.com/naktinis/language-id.git ctrl-dep-exc
cd ctrl-dep-exc/
python3 -m venv venv
. venv/bin/activate
pip install tensorflow==1.4.0 Pillow
python3 main.py --image-dir test_data/ --label-file test_data/labels.csv --model rnn

The specific code change that was enough to produce the exception (seems to match the snippet in the official documentation): https://github.com/naktinis/language-id/commit/50740f

Full exception:

Traceback (most recent call last):
  File "main.py", line 173, in <module>
    tf.app.run(main=run_experiment)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 165, in run_experiment
    hparams=params,
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 218, in run
    return _execute_schedule(experiment, schedule)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 46, in _execute_schedule
    return task()
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 625, in train_and_evaluate
    self.train(delay_secs=0)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 367, in train
    hooks=self._train_monitors + extra_hooks)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 807, in _call_train
    hooks=hooks)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 302, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 783, in _train_model
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 521, in run
    run_metadata=run_metadata)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 892, in run
    run_metadata=run_metadata)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 967, in run
    raise six.reraise(*original_exc_info)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 952, in run
    return self._sess.run(*args, **kwargs)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1024, in run
    run_metadata=run_metadata)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 827, in run
    return self._sess.run(*args, **kwargs)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/naktinis/code/ctrl-dep-exc/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Retval[0] does not have value

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 25 (18 by maintainers)

Commits related to this issue

fixing tensorflow batch norm vs control dependencies issue https://github.com/tensorflow/tensorflow/issues/14357 — committed to ufal/neuralmonkey by jindrahelcl 6 years ago
Include support for batch norm layer Batch normalization layer should be double checked for correctness. Currently the contrib version is being used and sets the `output_collections` option to `None`... — committed to yeahml/yeahml by JackBurdick 6 years ago

Most upvoted comments

I met the same problem

inputs = ....
inputs = RNN op on inputs
inputs = tf.layers.batch_normalization(inputs, training=self.is_train, trainable=True)
.....
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    self.optimizer = ....

When removing RNN ops or replacing tf.layers.batch_normalization with tf.contrib.layers.batch_norm(inputs=inputs, updates_collections=None), all things go well. Otherwise, Retval[0] does not have value occur.

ericxsun on May 7, 2018

@ebrevdo

Does the tf.contrib.layers.batch_norm workaround work if I’m not doing the usual optimizer.minimize() training, but using optimizer.compute_gradients() on 2 small batches, adding up the gradients, and then calling optimizer.apply_gradients() on the summed gradients? Will the BN moving mean & variance get update once every small batch, every other small batch, every apply_gradients() or never?

This is important for cases where the batch size is too big for GPU memory and each batch needs to have gradients calculated separately for smaller sub-batches.

I don’t think I have authorization to re-open this issue.

jchia on May 9, 2019

I met the same problem using:

TF 1.5.1 w: CUDA 9.0: Deadlocks the computation instead of throwing an error (you have to kill the process, not even interrupt works)
TF 1.6 CPU: Throws error described above

I’ve also managed to find a workaround. Saving the update_ops operation and forcing its evaluation explicitly through in session.run() call instead of using with tf.control_dependencies(update_ops) seems to mitigate the issue.

petrroll on May 31, 2018

@jpienaar had some plans to raise a better error when an op inside a while loop is used as a control input to an op outside the loop. I’m not sure what the status or timeline of that is at the moment.

skye on Feb 23, 2018

@fchollet this seems to be enough to reproduce it https://gist.github.com/naktinis/4200f955087bb07223f1fc3bb6255528

naktinis on Dec 21, 2017