tensorflow: InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse
when i use tensorflow2.1 .I trained my model custom,yesterday,the code can run correctly. But torday is error, I debug my code , I found this code is error ,
self.train_accuracy = tf.keras.metrics.CategoricalAccuracy('train_accuracy')
but this code is actually correct, Now I assert a variable using this in jupyter notebook, it’s wrong ! the error is :
InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse
who can tell me the reason and some solutions,thanks.
the whole code is:
import os
import numpy as np
import cv2
import tensorflow as tf
class ModelTrain():
    def __init__(self):
        self.loss_object = tf.keras.losses.CategoricalCrossentropy()
        self.optimizer = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999)
        self.train_loss = tf.keras.metrics.CategoricalCrossentropy('train_loss', dtype=tf.float32)
        self.train_accuracy = tf.keras.metrics.CategoricalAccuracy('train_accuracy')
        self.validation_loss = tf.keras.metrics.CategoricalCrossentropy('validation_loss', dtype=tf.float32)
        self.validation_accuracy = tf.keras.metrics.CategoricalAccuracy('validation_accuracy')
        
if __name__ == "__main__":
    model_train = ModelTrain()
the error is :
Traceback (most recent call last):
  File "/media/huaxin/tcl3/facepro/hand-gesture-recognition/jester-data-preprocessing_v0.2/test.py", line 18, in <module>
    model_train = ModelTrain()
  File "/media/huaxin/tcl3/facepro/hand-gesture-recognition/jester-data-preprocessing_v0.2/test.py", line 12, in __init__
    self.train_loss = tf.keras.metrics.CategoricalCrossentropy('train_loss', dtype=tf.float32)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py", line 2818, in __init__
    label_smoothing=label_smoothing)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py", line 560, in __init__
    super(MeanMetricWrapper, self).__init__(name=name, dtype=dtype)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py", line 460, in __init__
    reduction=metrics_utils.Reduction.WEIGHTED_MEAN, name=name, dtype=dtype)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py", line 296, in __init__
    'total', initializer=init_ops.zeros_initializer)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/metrics.py", line 276, in add_weight
    aggregation=aggregation)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 446, in add_weight
    caching_device=caching_device)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 744, in _add_variable_with_custom_getter
    **kwargs_for_getter)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 142, in make_variable
    shape=variable_shape if variable_shape else None)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 258, in __call__
    return cls._variable_v1_call(*args, **kwargs)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 219, in _variable_v1_call
    shape=shape)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 197, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/variable_scope.py", line 2596, in default_variable_creator
    shape=shape)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 262, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 1411, in __init__
    distribute_strategy=distribute_strategy)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 1557, in _init_from_args
    graph_mode=self._in_graph_mode)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 232, in eager_safe_variable_handle
    shape, dtype, shared_name, name, graph_mode, initial_value)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 164, in _variable_handle_from_shape_and_dtype
    math_ops.logical_not(exists), [exists], name="EagerVariableNameReuse")
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 55, in _assert
    _ops.raise_from_not_ok_status(e, name)
  File "/media/huaxin/tcl3/facepro/anaconda3/envs/python3.7.4/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse
the same situation existed yesterday,I uninstall tensorflow2.1 and re-install,the problemis solved,but today same problem exist again,what’s the reason, and how can solve this.
About this issue
- Original URL
 - State: closed
 - Created 4 years ago
 - Comments: 72 (5 by maintainers)
 
In case somebody runs into this issue, I had a very similar error and it ended up that I had another background process running that was using Tensorflow (I was running a model serving process while trying to a run a model training process). It looks like trying to run two pythong programs that use TF at the same time triggers this error.
this problem have been solved,this problem happens when there are many GPU in your server,but some GPU have been used by others,so when you create some variables or objects, there is no GPU memory for you,the solution is that you shuold choose one GPU that have not been used.
This issue is still persistent and there is no proper cause and solution relationship. Please re-open this issue.
Regardless of other underlying circumstances (running on 0/1/many gpus, et.c.) this problem hasn’t been solved as long as the error message is totally opaque and unhelpful to anyone not actually developing tensorflow. An addition of ‘different gpu processes might be in conflict’ would already be a lot better and some more “forensic” information wouldn’t come amiss either.
I’m a newb here. First time trying to get TF working with my Quadro M500M card. Not sure if it’s even relevant, but considering any bit of information could be useful information…
TF 2.3 + CUDA 10.1 + cudnn 10.1 - 8.0.3.33 + python 3.8 kept telling me it couldn’t find cudnn64_7.dll. After looking around I found with cudnn 10.1 came cudnn64_8.dll. I’m new to this so I don’t know much, but that just struck me as odd that it was asking for a file from a version of cudnn not meant for CUDA 10.1. That aside, after pointing my path at cudnn 10.0 - 7.6.5.32, which had the correct file, I got past the previous stumbling block, but then to this error.
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuseFrom this line (from the beginner tutorial)
model = keras.Sequential([ keras.layers.Flatten(input_shape=(28,28)), keras.layers.Dense(128, activation=‘relu’), keras.layers.Dense(10) ])
Honestly, killing other processes is not a solution it self. In my opinion this issue must be open since the problem isn’t solved yet.
I got this error message in a new 2.3 environment (both tensorflow and tensorflow-gpu). When I created a new environment with version 2.2 for both, the problem went away. So this bug shouldn’t be closed.
I got this error when I upgraded from tf 2.2 to 2.3. I downgraded back to 2.2 and I am no longer getting the error.
This issue should probably be re-opened, as it exists with only one GPU, with all other services killed (excluding Windows service processes). The issue is resolved by adding something like
tf.config.experimental.set_visible_devices([], 'GPU')to disable the GPU, but that means that you can’t train networks using your GPU.Um, this problem certainly has not been solved. Many of us only have one GPU.
I’m getting the same error. If I force it to use CPU, code executes fine, when using GPU I get the error: “InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse”
Running Win10 Conda setup with GF 930M (Cc 5)
All tests indicate that the GPU is available and tests okay.
Definitely not a ‘Closed’ issue
I just had a similar problem with model.compile() without using metric while running multiple processes. But it seems to be fixed with tf-2.3 and memory restricted GPU.
i have similar error and fixed it by close other python session using tensorflow. Use ‘nvidia-smi’ to find running process.
Thank you very much for this post!!! I was having this same issue with Tensorflow v2.3 on a Windows 10 pc with the latest python, conda and pycharm versions, just downgraded to v2.2 and it did the trick. I only needed to go to my Conda prompt, activate my Pycharm environment, uninstall, install and voila! Basically this:
(base)
conda activate Pycharm(Pycharm)conda uninstall tensorflow… (Pycharm)conda install tensorflow==2.2I had the same issue today. Using
“fixed” the problem.
However, possibly unrelated, it now takes a long time (like a minute or more) to go from attaching GPU 0 to start training. Unclear what is taking so long.
I am having same problem. I can downgrade to 2.2 but then i cant use tf.keras.preprocessing.text_dataset_from_directory like tensorflow tutorial says, and “solution” to that is to use 2.3 but i cant because of problem above. Any suggestions?
Edit: I had multiple tensorflow installed so i uninstalled both 2.2 and 2.3 and installed tf-nightly which is 2.4. Now everything works fine. Dont know if its because i had multiple versions installed or they fixed it in nightly. Edit2: Not solved since 2.4 version wasnt using GPU, only CPU. It seems I need CUDA 11 for tf-gpu 2.4 but for some reason CUDA update restarts my computer. I guess i ll downgrade to 2.2
Didn’t work for me…
For me, the issue was caused by the Chrome GPU process. Apparently, something in Chrome is using the name “train_loss”. Once I killed the chrome gpu process, my program started to work.
By encapsulating all tensors into a variable scope (like ‘__chrome’ or something), Chrome would do the ML community a huge favor. I’ve just filed a bug with Chrome to fix that. https://bugs.chromium.org/p/chromium/issues/detail?id=1086032
same here, another process running in background triggered this error.
i am having same problem. it’s really work for me. i uninstall tensorflow2.3 and i install tensorflow 2.2 .
Like many comments above, I confirm the problem on tensorflow 2.3. Downgrading to tensorflow 2.2 is working. But it is a workaround but it is definitly not solving the problem on 2.3. @Saduf2019, we should reopen the issue.
I don’t think that Pyton version is the problem. I am running tf in Win10 with Python 3.7. I would like to point out that there isn’t a single specific line of code that generates the problem. Even a simple line like
model = tf.keras.Sequential()causes the assertion to fail.I am stuck with the same problem from days, no other processes are running on my GPU. Did someone find a solution?
Any news on this issue?
I’m training multiple models at the same time. I should be able to maximize the usage of my GPUs by training concurrently.
I am having the same issue with no other processes running. I started to get this error after updating my driver to 418.87. In another node with same GPU series, but earlier driver (396.82), the code works just fine…
After analyzing my code, I discovered the problem and it is funny. In my code, I am using torch and tensorflow and if I import torch first tensorflow will break
import torchimport tensorflow as tftrain_loss = tf.keras.metrics.Mean(name=“train_loss”)The above code will give InvalidArgumentError assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse
Two or more processes running tensorflow (2.2.0), one of them running a code as simple as
var = tf.Variable([3, 3])will trigger this error. Solution: kill the other processes.@osushkov Yes, I got the same error. In the jupyter lab, I used two notebook. Finally with closing all notebooks, I solved the issue after open with one notebook.