tensorflow: Upgrade from r11 to r12 prodeuces "Variables not defined" when using any optimizer but GradientDescentOptimizer
After a recent upgrade to the latest version of tensorflow in github, several things stop working. I found out that all the optimizers, such as Adam or Adagrad are now producing an error related to variable scope that I have not managed to solve yet. However, GradientDescentOptimizer works fine.
It may be related to the issue: https://github.com/tensorflow/tensorflow/issues/5652
The error looks like this:
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 651, in _get_single_variable
"VarScope?" % name)
ValueError: Variable filter/Adadelta/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
It works fine with tensorflow r11
Operating System: Ubuntu 16 and Ubuntu 14 Installed version of CUDA and cuDNN: cuda 8.0, cuda 5.1 cuda.txt The commit hash 6dc8deaed8d8bd9cc6d52a03474d0b82891c8b86 Build time: Wed Nov 2 17:54:14 2016 (1478109254) Build timestamp: 1478109254 Build timestamp as int: 1478109254
Find below a minimal version that causes the error:
import tensorflow as tf
import pdb
def main():
## !!! change this to test the different behaviors !!!
#optimizer = tf.train.GradientDescentOptimizer(1e-3) # This one is working
optimizer = tf.train.AdamOptimizer(1e-3, beta1=0.9, beta2=0.999999) # This one is not working
#optimizer = tf.train.AdagradOptimizer(1e-3) # This one is not working
#optimizer = tf.train.AdadeltaOptimizer(1e-3) # This one is not working
list_grads = []
for i in xrange(2):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%d' % i) as scope:
W = tf.get_variable(name="filter", initializer=tf.random_uniform_initializer(dtype=tf.float32), shape=[5, 1])
X = tf.get_variable(name="data", initializer=tf.random_uniform_initializer(dtype=tf.float32), shape=[5, 1])
Y_ = tf.get_variable(name="out", initializer=tf.random_uniform_initializer(dtype=tf.float32), shape=[5, 1])
Y = W+X
loss =tf.reduce_mean(Y-Y_)
grad = optimizer.compute_gradients(loss)
list_grads.append(grad)
tf.get_variable_scope().reuse_variables()
grads = list_grads[0] + list_grads[1]
#pdb.set_trace()
op_train = optimizer.apply_gradients(grads)
init_global = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
sess = tf.Session()
sess.run([init_global, init_local])
_, sol = sess.run([op_train, loss])
print(str(sol))
if (__name__ == '__main__'):
main()
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 21 (7 by maintainers)
Commits related to this issue
- Wrap the cifar10 multigpu model construction part with a variable_scope Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables cou... — committed to wookayin/tensorflow-models by wookayin 7 years ago
- Wrap the cifar10 multigpu model construction part with a variable_scope Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables cou... — committed to wookayin/tensorflow-models by wookayin 7 years ago
- Update from origin (#1) * Fix bug in relative path of shell scripts built with bazel. * Add Bazel workspace name to fix bug in relative path of shell scripts. * Update citation in README.md ... — committed to Peratham/models by Peratham 7 years ago
- Wrap the cifar10 multigpu model construction part with a variable_scope Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables cou... — committed to taylorpaul/cifar10_tf by wookayin 7 years ago
- Wrap the cifar10 multigpu model construction part with a variable_scope Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables cou... — committed to tensorflow/examples by wookayin 7 years ago
To clarify, we just need to put a scope around the model-construction part.
Hope that helps!
Sure, let me try to clarify.
When you do
tf.get_variable_scope().reuse_variables()you set the current scope to reuse variables. If you call the optimizer in such scope, it’s trying to reuse slot variables, which it cannot find, so it throws an error. If you put a scope around, thetf.get_variable_scope().reuse_variables()only affects that scope, so when you exit it, you’re back in the non-reusing mode, the one you want.Hope that helps, let me know if I should clarify more.
Sorry sherry – the current behaviour is correct. Your code is leaking reuse – it just wasn’t checked before. It could cause all other troubles, and I think we should correct the leaky reuse cases, not revert the slot change. I’ll write more on the test cases, closing this.
@lukaszkaiser Hello, I found that your workaround to put a variable_scope which wraps the outermost num_gpus loop, but I am still confused why it does eliminate the error.
Is it just because that the
tf.get_variable_scope()(which is identical tovscope) is explicitly created than the implicit default? Then, what do these twoVariableScopeobjects differ in?What do you mean by “leaky reuse”? Could you please clarify me? /cc @cesc-park
i am a student,i am not very familiar with tensorflow, i just follow @lukaszkaiser and use with ’ tf.variable_scope(tf.get_variable_scope(),reuse=tf.AUTO_REUSE) as scope:’ and delete the ‘tf.get_variable_scope().reuse_variables()’ my code is work . i am runing the code of ROLO.
@Huayra007 if you remove the
should be able to run it. You are calling 2 time your generator. So or you remove the snippet or you add reuse to your generator code as such:
and when you call it for the tensorboard as such:
I hope this helps. Good luck with your GANs 😉
Ah, great. Your explanation is clear and helpful. Thanks!
To sum, a thing to remember is that where the (Adam-like) optimizer acts, i.e.
opt.apply_gradients(...)(which is where the error is thrown from) should lie in the scope withreuse=Falsein order to properly create the slot variables.