tensorflow: InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse

when i use tensorflow2.1 .I trained my model custom,yesterday，the code can run correctly. But torday is error, I debug my code , I found this code is error ,

self.train_accuracy = tf.keras.metrics.CategoricalAccuracy('train_accuracy')

but this code is actually correct, Now I assert a variable using this in jupyter notebook, it’s wrong ! the error is :

InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse

who can tell me the reason and some solutions,thanks.

the whole code is:

import os
import numpy as np
import cv2
import tensorflow as tf



class ModelTrain():
    def __init__(self):
        self.loss_object = tf.keras.losses.CategoricalCrossentropy()
        self.optimizer = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999)
        self.train_loss = tf.keras.metrics.CategoricalCrossentropy('train_loss', dtype=tf.float32)
        self.train_accuracy = tf.keras.metrics.CategoricalAccuracy('train_accuracy')
        self.validation_loss = tf.keras.metrics.CategoricalCrossentropy('validation_loss', dtype=tf.float32)
        self.validation_accuracy = tf.keras.metrics.CategoricalAccuracy('validation_accuracy')
        
if __name__ == "__main__":
    model_train = ModelTrain()

the error is :

Traceback (most recent call last):
  File "/media/huaxin/tcl3/facepro/hand-gesture-recognition/jester-data-preprocessing_v0.2/test.py", line 18, in <module>
    model_train = ModelTrain()
  File "/media/huaxin/tcl3/facepro/hand-gesture-recognition/jester-data-preprocessing_v0.2/test.py", line 12, in __init__
    self.train_loss = tf.keras.metrics.CategoricalCrossentropy('train_loss', dtype=tf.float32)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py", line 2818, in __init__
    label_smoothing=label_smoothing)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py", line 560, in __init__
    super(MeanMetricWrapper, self).__init__(name=name, dtype=dtype)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py", line 460, in __init__
    reduction=metrics_utils.Reduction.WEIGHTED_MEAN, name=name, dtype=dtype)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py", line 296, in __init__
    'total', initializer=init_ops.zeros_initializer)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py", line 276, in add_weight
    aggregation=aggregation)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 446, in add_weight
    caching_device=caching_device)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 744, in _add_variable_with_custom_getter
    **kwargs_for_getter)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 142, in make_variable
    shape=variable_shape if variable_shape else None)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 258, in __call__
    return cls._variable_v1_call(*args, **kwargs)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 219, in _variable_v1_call
    shape=shape)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 197, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/variable_scope.py", line 2596, in default_variable_creator
    shape=shape)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 262, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 1411, in __init__
    distribute_strategy=distribute_strategy)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 1557, in _init_from_args
    graph_mode=self._in_graph_mode)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 232, in eager_safe_variable_handle
    shape, dtype, shared_name, name, graph_mode, initial_value)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 164, in _variable_handle_from_shape_and_dtype
    math_ops.logical_not(exists), [exists], name="EagerVariableNameReuse")
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 55, in _assert
    _ops.raise_from_not_ok_status(e, name)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse

the same situation existed yesterday,I uninstall tensorflow2.1 and re-install,the problemis solved,but today same problem exist again,what’s the reason, and how can solve this.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 72 (5 by maintainers)

Most upvoted comments

In case somebody runs into this issue, I had a very similar error and it ended up that I had another background process running that was using Tensorflow (I was running a model serving process while trying to a run a model training process). It looks like trying to run two pythong programs that use TF at the same time triggers this error.

+51

osushkov on Apr 26, 2020

this problem have been solved，this problem happens when there are many GPU in your server，but some GPU have been used by others，so when you create some variables or objects, there is no GPU memory for you,the solution is that you shuold choose one GPU that have not been used.

+14

LoveMIssY on May 8, 2020

This issue is still persistent and there is no proper cause and solution relationship. Please re-open this issue.

+11

GSNCodes on Sep 1, 2020

Regardless of other underlying circumstances (running on 0/1/many gpus, et.c.) this problem hasn’t been solved as long as the error message is totally opaque and unhelpful to anyone not actually developing tensorflow. An addition of ‘different gpu processes might be in conflict’ would already be a lot better and some more “forensic” information wouldn’t come amiss either.

emojjon on Jun 2, 2020

I’m a newb here. First time trying to get TF working with my Quadro M500M card. Not sure if it’s even relevant, but considering any bit of information could be useful information…

TF 2.3 + CUDA 10.1 + cudnn 10.1 - 8.0.3.33 + python 3.8 kept telling me it couldn’t find cudnn64_7.dll. After looking around I found with cudnn 10.1 came cudnn64_8.dll. I’m new to this so I don’t know much, but that just struck me as odd that it was asking for a file from a version of cudnn not meant for CUDA 10.1. That aside, after pointing my path at cudnn 10.0 - 7.6.5.32, which had the correct file, I got past the previous stumbling block, but then to this error.

tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse

From this line (from the beginner tutorial)

model = keras.Sequential([ keras.layers.Flatten(input_shape=(28,28)), keras.layers.Dense(128, activation=‘relu’), keras.layers.Dense(10) ])

albion01 on Sep 7, 2020

Honestly, killing other processes is not a solution it self. In my opinion this issue must be open since the problem isn’t solved yet.

matheushent on Jun 12, 2020

I got this error message in a new 2.3 environment (both tensorflow and tensorflow-gpu). When I created a new environment with version 2.2 for both, the problem went away. So this bug shouldn’t be closed.

leszekmp on Aug 15, 2020

I got this error when I upgraded from tf 2.2 to 2.3. I downgraded back to 2.2 and I am no longer getting the error.

Redbtkekana on Aug 6, 2020

This issue should probably be re-opened, as it exists with only one GPU, with all other services killed (excluding Windows service processes). The issue is resolved by adding something like tf.config.experimental.set_visible_devices([], 'GPU') to disable the GPU, but that means that you can’t train networks using your GPU.

AndrewKhans on Jun 15, 2020

this problem have been solved，this problem happens when there are many GPU in your server，but some GPU have been used by others，so when you create some variables or objects, there is no GPU memory for you,the solution is that you shuold choose one GPU that have not been used.

Um, this problem certainly has not been solved. Many of us only have one GPU.

quantitative-technologies on May 11, 2020

I’m getting the same error. If I force it to use CPU, code executes fine, when using GPU I get the error: “InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse”

Running Win10 Conda setup with GF 930M (Cc 5)

All tests indicate that the GPU is available and tests okay.

Definitely not a ‘Closed’ issue

knoaf on Aug 15, 2020

I just had a similar problem with model.compile() without using metric while running multiple processes. But it seems to be fixed with tf-2.3 and memory restricted GPU.

PatReis on Aug 2, 2020

i have similar error and fixed it by close other python session using tensorflow. Use ‘nvidia-smi’ to find running process.

thainq3127 on Apr 29, 2020

Thank you very much for this post!!! I was having this same issue with Tensorflow v2.3 on a Windows 10 pc with the latest python, conda and pycharm versions, just downgraded to v2.2 and it did the trick. I only needed to go to my Conda prompt, activate my Pycharm environment, uninstall, install and voila! Basically this:

(base) conda activate Pycharm (Pycharm) conda uninstall tensorflow … (Pycharm) conda install tensorflow==2.2

richerram on Nov 26, 2020

I had the same issue today. Using

pip install tensorflow-gpu==2.2.0

“fixed” the problem.
However, possibly unrelated, it now takes a long time (like a minute or more) to go from attaching GPU 0 to start training. Unclear what is taking so long.

eafpres on Nov 9, 2020

I am having same problem. I can downgrade to 2.2 but then i cant use tf.keras.preprocessing.text_dataset_from_directory like tensorflow tutorial says, and “solution” to that is to use 2.3 but i cant because of problem above. Any suggestions?

Edit: I had multiple tensorflow installed so i uninstalled both 2.2 and 2.3 and installed tf-nightly which is 2.4. Now everything works fine. Dont know if its because i had multiple versions installed or they fixed it in nightly. Edit2: Not solved since 2.4 version wasnt using GPU, only CPU. It seems I need CUDA 11 for tf-gpu 2.4 but for some reason CUDA update restarts my computer. I guess i ll downgrade to 2.2

leonjovanovic on Sep 8, 2020

I am stuck with the same problem from days, no other processes are running on my GPU. Did someone find a solution?

try to downgrade to 2.2.0 if you havent alredy, it solved my problem

Didn’t work for me…

rocreguant on Aug 23, 2020

For me, the issue was caused by the Chrome GPU process. Apparently, something in Chrome is using the name “train_loss”. Once I killed the chrome gpu process, my program started to work.

By encapsulating all tensors into a variable scope (like ‘__chrome’ or something), Chrome would do the ML community a huge favor. I’ve just filed a bug with Chrome to fix that. https://bugs.chromium.org/p/chromium/issues/detail?id=1086032

pshved on May 24, 2020

same here, another process running in background triggered this error.

matpalm on May 6, 2020

I got this error when I upgraded from tf 2.2 to 2.3. I downgraded back to 2.2 and I am no longer getting the error.

i am having same problem. it’s really work for me. i uninstall tensorflow2.3 and i install tensorflow 2.2 .

WSKKG on Sep 23, 2020

Like many comments above, I confirm the problem on tensorflow 2.3. Downgrading to tensorflow 2.2 is working. But it is a workaround but it is definitly not solving the problem on 2.3. @Saduf2019, we should reopen the issue.

q-55555 on Sep 5, 2020

It seems that the problem is in Python versions. I’m trying to repeat tutorial https://www.tensorflow.org/tutorials/text/nmt_with_attention In py -3.6, tf.2.30 (Windows 7, 64 bit) line encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE) creates the same error. I switched to Windows 10, py -3.7 - no problems. In both computers tf uses GPU, cudart64_101.dll This kind of problems are with tf very common, thus I have installed Pyhton 3.5 (works best with tf 1.*), 3.6, 3.7 (currently seems the best) and 3.8 (already 64b)

I don’t think that Pyton version is the problem. I am running tf in Win10 with Python 3.7. I would like to point out that there isn’t a single specific line of code that generates the problem. Even a simple line like model = tf.keras.Sequential() causes the assertion to fail.

alessandroScarcia on Sep 3, 2020

I am stuck with the same problem from days, no other processes are running on my GPU. Did someone find a solution?

alessandroScarcia on Aug 20, 2020

Any news on this issue?

I’m training multiple models at the same time. I should be able to maximize the usage of my GPUs by training concurrently.

CardosoJr on Jul 10, 2020

I am having the same issue with no other processes running. I started to get this error after updating my driver to 418.87. In another node with same GPU series, but earlier driver (396.82), the code works just fine…

After analyzing my code, I discovered the problem and it is funny. In my code, I am using torch and tensorflow and if I import torch first tensorflow will break

import torch import tensorflow as tf train_loss = tf.keras.metrics.Mean(name=“train_loss”)

The above code will give InvalidArgumentError assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse

jalalirs on Jun 25, 2020

Two or more processes running tensorflow (2.2.0), one of them running a code as simple as var = tf.Variable([3, 3]) will trigger this error. Solution: kill the other processes.

carloshpf on Jun 12, 2020

@osushkov Yes, I got the same error. In the jupyter lab, I used two notebook. Finally with closing all notebooks, I solved the issue after open with one notebook.

hmkim on May 8, 2020