tensorflow: Gradients do not exist for variables after tf.concat().
Tensorflow is unable to compute gradients after merging two variables with tf.concat(). The issue is demonstrated in the following colab: https://colab.research.google.com/drive/1dkCcL5jfBmo47EsvmhNumjIkCGIdeFd5
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): I have based my issue on codes composed from oficial tutorials.
- OS Platform and Distribution (e.g.,Linux Ubuntu 16.04): Google Colab
- TensorFlow installed from (source or binary): Binary
- TensorFlow version (use command below): v2.1.0-0-ge5bf8de410 2.1.0 I have tested the code also on TF 2.2 rc1.
- Python version: 3.6.9
Describe the current behavior
Tensorflow is unable to compute gradients after merging two variables with tf.concat(). The variables are not modified during training despite setting them as trainable.
Describe the expected behavior
Tensorflow should be able to compute gradients also for concatenated variables.
Standalone code to reproduce the issue
The full test is available in Colab: colab
The most important part is the following:
class CustomEmbedding(tf.keras.layers.Layer):
def __init__(self, input_dim, output_dim, mask_zero=False, **kwargs):
super(CustomEmbedding, self).__init__(**kwargs)
self.input_dim = input_dim
self.output_dim = output_dim
self.mask_zero = mask_zero
self.embeddings = None
def build(self, input_shape):
e1 = self.add_weight(
shape=(int(self.input_dim/2), self.output_dim),
dtype="float32", trainable=True,
name="e1")
e2 = self.add_weight(
shape=(self.input_dim-int(self.input_dim/2), self.output_dim),
dtype="float32", trainable=True,
name="e2")
self.embeddings = tf.concat((e1, e2), 0)
tf.print(self.embeddings)
def call(self, inputs):
return tf.nn.embedding_lookup(self.embeddings, inputs)
def compute_mask(self, inputs, mask=None):
if not self.mask_zero:
return None
return tf.not_equal(inputs, 0)
model = tf.keras.Sequential()
model.add(layers.Embedding(3,32,trainable=False))
model.add(layers.LSTM(32))
model.add(layers.Dense(16, "relu"))
model.add(layers.Dense(2, "softmax"))
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(data)
Other info / logs
[[-0.042568922 0.302248985 0.401204079 ... -0.204555377 0.235091716 0.257138401]
[-0.372319102 -0.415126026 0.340110391 ... -0.386968911 -0.410127133 -0.135176718]
[0.341201216 0.208624214 0.357687324 ... 0.0621320605 0.0829377472 0.119318634]
[0.380090982 0.0431897044 -0.2187078 ... -0.246274695 0.0664974749 0.223051161]]
Train for 400 steps
WARNING:tensorflow:Gradients do not exist for variables ['sequential_4/custom_embedding_4/e1:0', 'sequential_4/custom_embedding_4/e2:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['sequential_4/custom_embedding_4/e1:0', 'sequential_4/custom_embedding_4/e2:0'] when minimizing the loss.
400/400 [==============================] - 4s 11ms/step - loss: 0.3301 - accuracy: 1.0000
[[-0.042568922 0.302248985 0.401204079 ... -0.204555377 0.235091716 0.257138401]
[-0.372319102 -0.415126026 0.340110391 ... -0.386968911 -0.410127133 -0.135176718]
[0.341201216 0.208624214 0.357687324 ... 0.0621320605 0.0829377472 0.119318634]
[0.380090982 0.0431897044 -0.2187078 ... -0.246274695 0.0664974749 0.223051161]]
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 22 (5 by maintainers)
tf.concaton scalars is not supported, please usetf.stackfor scalars instead. The fact that callingtf.concaton scalars in eager mode does not raise an error is a bug. For example:Raises:
ValueError: Can't concatenate scalars (use tf.stack instead)I agree we should raise a better error message here instead of silently dropping gradients.
I reproduced @acxz 's result. This is insightful.
So tf.concat can only be performed in the tf.GradientTape context in order to maintain gradients? Is this a bug or by design? Can someone please explain why that is the case? How is tf.concat different than other TensorFlow functions in this regard?
Thanks!
@Lescurel
Having
with tf.GradientTape() as tape:before thetf.concatcall works for me:Output:
Please, look at the code. I use a sequential model. The variable is indeed connected to the output. Exactly the same code, but without
tf.contact(), issues no warning and works all right.Hi!
Same issue here… I am trying to re-write MAVNet (https://github.com/sudakshin/imitation_learning/blob/master/2.train_model/MavNet.py) into genuine Tensorflow (instead of TFLearn) and train it. Note that MAVNet consists of numerous tf.concat(). For some reason, gradient flow seems to break whenever it passes tf.concat(), and the network is not trained at all.
On the other hand, when trying to train another neural network that is similar to MAVNet but does not have such tf.concat() functions, the whole network is trained properly.
This issue is reproduced in not only TF 2.3, but also in TF 2.8.
I believe this is a very serious issue since, as far as this issue persists, all neural network models that rely on tf.concat, tf.keras.layers.concatenate, or tf.keras.layers.Concatenate will not be trained properly at all…
Hi, if someone worked out this problem? How to get the gradients if we have lots of tf.concat in our function?
I’m using an amount of tf.concat in my functions, and to get the gradients I’ve got InvalideArgumenError every time. It doesn’t point out which line/concat is wrong…
InvalidArgumentError: Determined shape must either match input shape along split_dim exactly if fully specified, or be less than the size of the input along split_dim if not fully specified. Got: 2 [Op:SplitV] name: splitI’ve tried to replace all tf.concat -> tf.stack/tf.reshape, then it tunrs out another problem:
InvalidArgumentError: Input to reshape is a tensor with 16 values, but the requested shape has 2 [Op:Reshape]Hi @jm-willy thanks for putting together the workarounds! To make sure I understand this issue correctly, the problem is after tf.concat w1 and w2 got no gradients, right?
In
If I print out the gradients for w1, w2, and w using TF2.5, it shows w1 and w2 still get None as their gradients.
whereas in acxz’s workaround w1 and w2 do have gradients:
My question is, is there any other workarounds that works without explicitly calling gradient tape? I’m using Keras model API and use fit/evaluate/etc to run the model, hence do not have direct access to gradient tapes…
@jvishnuvardhan do you think you can take a quick look at this issue? Seems to be a recurring problem with many people. Just getting some clarification would be nice as well.
Thx.
@saxenasaurabh The code above returns
Nonefor tensorsw1 = tf.Variable([[1.0], [1.0]])w2 = tf.Variable([[3.0], [3.0]]). Same with scalarsw1 = tf.Variable([[1.0]])w2 = tf.Variable([[3.0]])I absolutely needed tf.concat to experiment around, so I found this easy workaround:
output:
Replacing
tf.Variablewithtf.constantreturns None. Seems liketf.concatreturns a non-trainable object and thus gradient can’t flow.Since
tf.concatis a commonly used function, I think the docs should be modified to include a temporal solution until the underlying bug is fixed.How is the Keras LSTM implementation able to work?. Because it uses
tf.keras.concatenateortf.concateach iteration.@acxz @Lescurel
Please advise if this issue has been resolved. I am using 2.4.0 and having the issue of the tape.gradient() returns None with tf.concat().
Tried to workaround by using tf.Variable which has the same result of tf.concat but still tape.gradient() returns None.