tensorflow: Upgrade from r11 to r12 prodeuces "Variables not defined" when using any optimizer but GradientDescentOptimizer

After a recent upgrade to the latest version of tensorflow in github, several things stop working. I found out that all the optimizers, such as Adam or Adagrad are now producing an error related to variable scope that I have not managed to solve yet. However, GradientDescentOptimizer works fine.

It may be related to the issue: https://github.com/tensorflow/tensorflow/issues/5652

The error looks like this:

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 651, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable filter/Adadelta/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

It works fine with tensorflow r11

Operating System: Ubuntu 16 and Ubuntu 14 Installed version of CUDA and cuDNN: cuda 8.0, cuda 5.1 cuda.txt The commit hash 6dc8deaed8d8bd9cc6d52a03474d0b82891c8b86 Build time: Wed Nov 2 17:54:14 2016 (1478109254) Build timestamp: 1478109254 Build timestamp as int: 1478109254

Find below a minimal version that causes the error:

import tensorflow as tf
import pdb

def main():

    ## !!! change this to test the different behaviors !!!
    #optimizer = tf.train.GradientDescentOptimizer(1e-3)                 # This one is working
    optimizer = tf.train.AdamOptimizer(1e-3, beta1=0.9, beta2=0.999999) # This one is not working
    #optimizer = tf.train.AdagradOptimizer(1e-3)                         # This one is not working
    #optimizer = tf.train.AdadeltaOptimizer(1e-3)                        # This one is not working
	
    list_grads = []
    for i in xrange(2):
        with tf.device('/gpu:%d' % i):
            with tf.name_scope('%d' % i) as scope:
                W = tf.get_variable(name="filter", initializer=tf.random_uniform_initializer(dtype=tf.float32), shape=[5, 1])
                X = tf.get_variable(name="data", initializer=tf.random_uniform_initializer(dtype=tf.float32), shape=[5, 1])
                Y_ = tf.get_variable(name="out", initializer=tf.random_uniform_initializer(dtype=tf.float32), shape=[5, 1])
                Y = W+X
                loss =tf.reduce_mean(Y-Y_)
                grad = optimizer.compute_gradients(loss)
                list_grads.append(grad)

                tf.get_variable_scope().reuse_variables()	
    
    grads = list_grads[0] + list_grads[1]
    #pdb.set_trace()

    op_train = optimizer.apply_gradients(grads)

    init_global = tf.global_variables_initializer()
    init_local =  tf.local_variables_initializer()

    sess = tf.Session()
    sess.run([init_global, init_local])

    _, sol = sess.run([op_train, loss])
    print(str(sol))

if (__name__ == '__main__'):
	main()

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 21 (7 by maintainers)

Commits related to this issue

Most upvoted comments

To clarify, we just need to put a scope around the model-construction part.

with tf.variable_scope(tf.get_variable_scope()) as scope:
  for i in xrange(2):
    ... code as before until ...reuse_varables() ....

grads = list_grads[0] + list_grads[1]
... rest of code as before ...

Hope that helps!

Sure, let me try to clarify.

When you do tf.get_variable_scope().reuse_variables() you set the current scope to reuse variables. If you call the optimizer in such scope, it’s trying to reuse slot variables, which it cannot find, so it throws an error. If you put a scope around, the tf.get_variable_scope().reuse_variables() only affects that scope, so when you exit it, you’re back in the non-reusing mode, the one you want.

Hope that helps, let me know if I should clarify more.

Sorry sherry – the current behaviour is correct. Your code is leaking reuse – it just wasn’t checked before. It could cause all other troubles, and I think we should correct the leaky reuse cases, not revert the slot change. I’ll write more on the test cases, closing this.

@lukaszkaiser Hello, I found that your workaround to put a variable_scope which wraps the outermost num_gpus loop, but I am still confused why it does eliminate the error.

with tf.variable_scope(tf.get_variable_scope()) as vscope:
  for i in xrange(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
      with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
        loss = tower_loss(scope)
        tf.get_variable_scope().reuse_variables()     # HERE

Is it just because that the tf.get_variable_scope() (which is identical to vscope) is explicitly created than the implicit default? Then, what do these two VariableScope objects differ in?

What do you mean by “leaky reuse”? Could you please clarify me? /cc @cesc-park

i am a student,i am not very familiar with tensorflow, i just follow @lukaszkaiser and use with ’ tf.variable_scope(tf.get_variable_scope(),reuse=tf.AUTO_REUSE) as scope:’ and delete the ‘tf.get_variable_scope().reuse_variables()’ my code is work . i am runing the code of ROLO.

@Huayra007 if you remove the

images_for_tensorboard = generator(z_placeholder,batch_size,z_dimension)
tf.summary.image('Generated_images',images_for_tensorboard,5)

should be able to run it. You are calling 2 time your generator. So or you remove the snippet or you add reuse to your generator code as such:


def generator(z,batch_size,z_dim,reuse=False):
    if (reuse):
        tf.get_variable_scope().reuse_variables()

    g_w1 = tf.get_variable('g_w1',[z_dim,56*56],dtype=tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.02))
    g_b1 = tf.get_variable('g_b1',[56*56],dtype=tf.float32,initializ



and when you call it for the tensorboard as such:

with tf.variable_scope(tf.get_variable_scope()) as scope:
    images_for_tensorboard = generator(z_placeholder, batch_size, z_dimensions,reuse=True)
    tf.summary.image('Generated_images', images_for_tensorboard, 5)

I hope this helps. Good luck with your GANs 😉

Ah, great. Your explanation is clear and helpful. Thanks!

To sum, a thing to remember is that where the (Adam-like) optimizer acts, i.e. opt.apply_gradients(...) (which is where the error is thrown from) should lie in the scope with reuse=False in order to properly create the slot variables.