tensorflow: sess.run([train_step]) freezes when using batch_normalization and Collection Update

Hi,

I have faced a strange situation with Tensorflow. I explored a lot about this problem but just found one other thread (unsolved) on StackOverflow. (here: https://stackoverflow.com/questions/47047124/tf-layers-batch-normalization-freezes-during-sess-run-1-5-0-dev20171031). So, I decided to ask it here.

Basically, when I call sess.run(), it freezes. By freeze, I mean, the GPU utilization is zero, I get no errors and the process is on GPU (GPU memory is allocated). I have the following code segment:

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(update_ops):
        train_step1 = optimizer1.minimize(loss = loss_fill +lossL2+ loss_detection,var_list=vars)

When I change this part to:

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
     with tf.control_dependencies(update_ops):
             pass
train_step1 = optimizer1.minimize(loss = loss_fill +lossL2+ loss_detection,var_list=vars)

which basically ignores some necessary moving average updates, it doesn’t freeze anymore. I have a lot of tf.layers.batch_normalization() instances in my code and this is the first time I am facing this issue.

Thanks

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 21 (6 by maintainers)

Most upvoted comments

@ppwwyyxx you’re right, that issue looks suspiciously similar and probably has the same root cause. On another note, I’ve come up with a workaround for the sample code I posted above.

The following code:

with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
  grads_and_vars = optimizer.compute_gradients(res)
  train_op = optimizer.apply_gradients(grads_and_vars, global_step)

can be transformed into:

res = tf.tuple([res], control_inputs=tf.get_collection(tf.GraphKeys.UPDATE_OPS))[0]
grads_and_vars = optimizer.compute_gradients(res)
train_op = optimizer.apply_gradients(grads_and_vars, global_step)

which forces the batch norm ops to run before gradients on res are computed, as desired.