tensorflow: A bug with slim.create_global_step

When I use slim.create_global_step in distributed training, the slim.create_global_step can make the workers freeze just after session has been created.

This can be reproduced by just replacing the global_step = tf.Variable(0, name="global_step", trainable=False) with global_step = slim.create_global_step() in the mnist_replica.py in the dist_test

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 21 (10 by maintainers)

Most upvoted comments

@wangyum According to you log, Invalid size in bundle entry: key global_step; stored size 8; expected size 4 I think there is a dtype mismatch for the global_step variable