tensorflow: Upgrade from r11 to r12 prodeuces "Variables not defined" when using any optimizer but GradientDescentOptimizer

After a recent upgrade to the latest version of tensorflow in github, several things stop working. I found out that all the optimizers, such as Adam or Adagrad are now producing an error related to variable scope that I have not managed to solve yet. However, GradientDescentOptimizer works fine.

It may be related to the issue: https://github.com/tensorflow/tensorflow/issues/5652

The error looks like this:

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 651, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable filter/Adadelta/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

It works fine with tensorflow r11

Operating System: Ubuntu 16 and Ubuntu 14 Installed version of CUDA and cuDNN: cuda 8.0, cuda 5.1 cuda.txt The commit hash 6dc8deaed8d8bd9cc6d52a03474d0b82891c8b86 Build time: Wed Nov 2 17:54:14 2016 (1478109254) Build timestamp: 1478109254 Build timestamp as int: 1478109254

Find below a minimal version that causes the error:

import tensorflow as tf
import pdb

def main():

    ## !!! change this to test the different behaviors !!!
    #optimizer = tf.train.GradientDescentOptimizer(1e-3)                 # This one is working
    optimizer = tf.train.AdamOptimizer(1e-3, beta1=0.9, beta2=0.999999) # This one is not working
    #optimizer = tf.train.AdagradOptimizer(1e-3)                         # This one is not working
    #optimizer = tf.train.AdadeltaOptimizer(1e-3)                        # This one is not working
	
    list_grads = []
    for i in xrange(2):
        with tf.device('/gpu:%d' % i):
            with tf.name_scope('%d' % i) as scope:
                W = tf.get_variable(name="filter", initializer=tf.random_uniform_initializer(dtype=tf.float32), shape=[5, 1])
                X = tf.get_variable(name="data", initializer=tf.random_uniform_initializer(dtype=tf.float32), shape=[5, 1])
                Y_ = tf.get_variable(name="out", initializer=tf.random_uniform_initializer(dtype=tf.float32), shape=[5, 1])
                Y = W+X
                loss =tf.reduce_mean(Y-Y_)
                grad = optimizer.compute_gradients(loss)
                list_grads.append(grad)

                tf.get_variable_scope().reuse_variables()	
    
    grads = list_grads[0] + list_grads[1]
    #pdb.set_trace()

    op_train = optimizer.apply_gradients(grads)

    init_global = tf.global_variables_initializer()
    init_local =  tf.local_variables_initializer()

    sess = tf.Session()
    sess.run([init_global, init_local])

    _, sol = sess.run([op_train, loss])
    print(str(sol))

if (__name__ == '__main__'):
	main()

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 21 (7 by maintainers)

Commits related to this issue

Wrap the cifar10 multigpu model construction part with a variable_scope Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables cou... — committed to wookayin/tensorflow-models by wookayin 7 years ago
Wrap the cifar10 multigpu model construction part with a variable_scope Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables cou... — committed to wookayin/tensorflow-models by wookayin 7 years ago
Update from origin (#1) * Fix bug in relative path of shell scripts built with bazel. * Add Bazel workspace name to fix bug in relative path of shell scripts. * Update citation in README.md ... — committed to Peratham/models by Peratham 7 years ago
Wrap the cifar10 multigpu model construction part with a variable_scope Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables cou... — committed to taylorpaul/cifar10_tf by wookayin 7 years ago
Wrap the cifar10 multigpu model construction part with a variable_scope Without the new variable_scope, creating apply_gradient_op raises an error that additional moving average or slot variables cou... — committed to tensorflow/examples by wookayin 7 years ago

Most upvoted comments

To clarify, we just need to put a scope around the model-construction part.

with tf.variable_scope(tf.get_variable_scope()) as scope:
  for i in xrange(2):
    ... code as before until ...reuse_varables() ....

grads = list_grads[0] + list_grads[1]
... rest of code as before ...

Hope that helps!

+54

lukaszkaiser on Dec 12, 2016

Sure, let me try to clarify.

When you do tf.get_variable_scope().reuse_variables() you set the current scope to reuse variables. If you call the optimizer in such scope, it’s trying to reuse slot variables, which it cannot find, so it throws an error. If you put a scope around, the tf.get_variable_scope().reuse_variables() only affects that scope, so when you exit it, you’re back in the non-reusing mode, the one you want.

Hope that helps, let me know if I should clarify more.

+41

lukaszkaiser on Jan 15, 2017

Sorry sherry – the current behaviour is correct. Your code is leaking reuse – it just wasn’t checked before. It could cause all other troubles, and I think we should correct the leaky reuse cases, not revert the slot change. I’ll write more on the test cases, closing this.

lukaszkaiser on Dec 10, 2016

@lukaszkaiser Hello, I found that your workaround to put a variable_scope which wraps the outermost num_gpus loop, but I am still confused why it does eliminate the error.

with tf.variable_scope(tf.get_variable_scope()) as vscope:
  for i in xrange(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
      with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
        loss = tower_loss(scope)
        tf.get_variable_scope().reuse_variables()     # HERE

Is it just because that the tf.get_variable_scope() (which is identical to vscope) is explicitly created than the implicit default? Then, what do these two VariableScope objects differ in?

What do you mean by “leaky reuse”? Could you please clarify me? /cc @cesc-park

wookayin on Jan 15, 2017

i am a student,i am not very familiar with tensorflow, i just follow @lukaszkaiser and use with ’ tf.variable_scope(tf.get_variable_scope(),reuse=tf.AUTO_REUSE) as scope:’ and delete the ‘tf.get_variable_scope().reuse_variables()’ my code is work . i am runing the code of ROLO.

haojitianya on May 11, 2018

@Huayra007 if you remove the

images_for_tensorboard = generator(z_placeholder,batch_size,z_dimension)
tf.summary.image('Generated_images',images_for_tensorboard,5)

should be able to run it. You are calling 2 time your generator. So or you remove the snippet or you add reuse to your generator code as such:


def generator(z,batch_size,z_dim,reuse=False):
    if (reuse):
        tf.get_variable_scope().reuse_variables()

    g_w1 = tf.get_variable('g_w1',[z_dim,56*56],dtype=tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.02))
    g_b1 = tf.get_variable('g_b1',[56*56],dtype=tf.float32,initializ

and when you call it for the tensorboard as such:

with tf.variable_scope(tf.get_variable_scope()) as scope:
    images_for_tensorboard = generator(z_placeholder, batch_size, z_dimensions,reuse=True)
    tf.summary.image('Generated_images', images_for_tensorboard, 5)

I hope this helps. Good luck with your GANs 😉

ivanjacobs on Oct 5, 2017

Ah, great. Your explanation is clear and helpful. Thanks!

To sum, a thing to remember is that where the (Adam-like) optimizer acts, i.e. opt.apply_gradients(...) (which is where the error is thrown from) should lie in the scope with reuse=False in order to properly create the slot variables.

wookayin on Jan 16, 2017