tensorflow: Gradients do not exist for variables after tf.concat().

Tensorflow is unable to compute gradients after merging two variables with tf.concat(). The issue is demonstrated in the following colab: https://colab.research.google.com/drive/1dkCcL5jfBmo47EsvmhNumjIkCGIdeFd5

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): I have based my issue on codes composed from oficial tutorials.
OS Platform and Distribution (e.g.,Linux Ubuntu 16.04): Google Colab
TensorFlow installed from (source or binary): Binary
TensorFlow version (use command below): v2.1.0-0-ge5bf8de410 2.1.0 I have tested the code also on TF 2.2 rc1.
Python version: 3.6.9

Describe the current behavior

Tensorflow is unable to compute gradients after merging two variables with tf.concat(). The variables are not modified during training despite setting them as trainable.

Describe the expected behavior

Tensorflow should be able to compute gradients also for concatenated variables.

Standalone code to reproduce the issue

The full test is available in Colab: colab

The most important part is the following:

class CustomEmbedding(tf.keras.layers.Layer):
  
  def __init__(self, input_dim, output_dim, mask_zero=False, **kwargs):
    super(CustomEmbedding, self).__init__(**kwargs)
    self.input_dim = input_dim
    self.output_dim = output_dim
    self.mask_zero = mask_zero
    self.embeddings = None
   
  def build(self, input_shape):
    e1 = self.add_weight(
      shape=(int(self.input_dim/2), self.output_dim),
      dtype="float32", trainable=True,
      name="e1")

    e2 = self.add_weight(
      shape=(self.input_dim-int(self.input_dim/2), self.output_dim),
      dtype="float32", trainable=True,
      name="e2")
    
    self.embeddings = tf.concat((e1, e2), 0)

    tf.print(self.embeddings)
    
  def call(self, inputs):
    return tf.nn.embedding_lookup(self.embeddings, inputs)
  
  def compute_mask(self, inputs, mask=None):
    if not self.mask_zero:
      return None
    return tf.not_equal(inputs, 0)

model = tf.keras.Sequential()
model.add(layers.Embedding(3,32,trainable=False))
model.add(layers.LSTM(32))
model.add(layers.Dense(16, "relu"))
model.add(layers.Dense(2, "softmax"))

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(data)

Other info / logs

[[-0.042568922 0.302248985 0.401204079 ... -0.204555377 0.235091716 0.257138401]
 [-0.372319102 -0.415126026 0.340110391 ... -0.386968911 -0.410127133 -0.135176718]
 [0.341201216 0.208624214 0.357687324 ... 0.0621320605 0.0829377472 0.119318634]
 [0.380090982 0.0431897044 -0.2187078 ... -0.246274695 0.0664974749 0.223051161]]
Train for 400 steps
WARNING:tensorflow:Gradients do not exist for variables ['sequential_4/custom_embedding_4/e1:0', 'sequential_4/custom_embedding_4/e2:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['sequential_4/custom_embedding_4/e1:0', 'sequential_4/custom_embedding_4/e2:0'] when minimizing the loss.
400/400 [==============================] - 4s 11ms/step - loss: 0.3301 - accuracy: 1.0000
[[-0.042568922 0.302248985 0.401204079 ... -0.204555377 0.235091716 0.257138401]
 [-0.372319102 -0.415126026 0.340110391 ... -0.386968911 -0.410127133 -0.135176718]
 [0.341201216 0.208624214 0.357687324 ... 0.0621320605 0.0829377472 0.119318634]
 [0.380090982 0.0431897044 -0.2187078 ... -0.246274695 0.0664974749 0.223051161]]

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 2
Comments: 22 (5 by maintainers)

Most upvoted comments

tf.concat on scalars is not supported, please use tf.stack for scalars instead. The fact that calling tf.concat on scalars in eager mode does not raise an error is a bug. For example:

def f(a, b):
  return tf.concat([a , b], axis = 0)

f(1., 2.)  # Works but it should not

tf.function(f)(1., 2.)  # Does not work

Raises:

ValueError: Can't concatenate scalars (use tf.stack instead)

I agree we should raise a better error message here instead of silently dropping gradients.

saxenasaurabh on May 6, 2021

I reproduced @acxz 's result. This is insightful.

So tf.concat can only be performed in the tf.GradientTape context in order to maintain gradients? Is this a bug or by design? Can someone please explain why that is the case? How is tf.concat different than other TensorFlow functions in this regard?

Thanks!

eevans89 on Jan 28, 2021

@Lescurel

Having with tf.GradientTape() as tape: before the tf.concat call works for me:

import tensorflow as tf

w1 = tf.Variable([[1.0]])
w2 = tf.Variable([[3.0]])
with tf.GradientTape() as tape:
    w = tf.concat([w1, w2], 0)
    x = tf.random.normal((1, 2))
    y = tf.reduce_sum(x, 1)
    r = tf.matmul(w, x)
    loss = tf.metrics.mse(y, r)
print(tape.gradient(loss, w))

Output:

2021-01-27 17:47:45.371562: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-27 17:47:45.371699: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-27 17:47:45.372290: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
tf.Tensor(
[[0.12809184]
 [1.3158427 ]], shape=(2, 1), dtype=float32)

acxz on Jan 27, 2021

Please, look at the code. I use a sequential model. The variable is indeed connected to the output. Exactly the same code, but without tf.contact(), issues no warning and works all right.

konopik on Apr 15, 2020

Hi!

Same issue here… I am trying to re-write MAVNet (https://github.com/sudakshin/imitation_learning/blob/master/2.train_model/MavNet.py) into genuine Tensorflow (instead of TFLearn) and train it. Note that MAVNet consists of numerous tf.concat(). For some reason, gradient flow seems to break whenever it passes tf.concat(), and the network is not trained at all.

On the other hand, when trying to train another neural network that is similar to MAVNet but does not have such tf.concat() functions, the whole network is trained properly.

This issue is reproduced in not only TF 2.3, but also in TF 2.8.

I believe this is a very serious issue since, as far as this issue persists, all neural network models that rely on tf.concat, tf.keras.layers.concatenate, or tf.keras.layers.Concatenate will not be trained properly at all…

joonjeon on Apr 18, 2022

Hi, if someone worked out this problem? How to get the gradients if we have lots of tf.concat in our function?

I’m using an amount of tf.concat in my functions, and to get the gradients I’ve got InvalideArgumenError every time. It doesn’t point out which line/concat is wrong…

InvalidArgumentError: Determined shape must either match input shape along split_dim exactly if fully specified, or be less than the size of the input along split_dim if not fully specified. Got: 2 [Op:SplitV] name: split

I’ve tried to replace all tf.concat -> tf.stack/tf.reshape, then it tunrs out another problem:

InvalidArgumentError: Input to reshape is a tensor with 16 values, but the requested shape has 2 [Op:Reshape]

sylviemonet on Aug 17, 2021

Hi @jm-willy thanks for putting together the workarounds! To make sure I understand this issue correctly, the problem is after tf.concat w1 and w2 got no gradients, right?

# jm-willy's workaround
w = tf.Variable(tf.concat([w1, w2], 0))  # This. Replacing tf.Variable with tf.constant returns None
y = tf.reduce_sum(x, 1)
with tf.GradientTape() as tape:
    r = tf.matmul(w, x)
    loss = tf.metrics.mse(y, r)
print(w)
print(tape.gradient(loss, [w1,w2,w]))
print('*' * 100)

If I print out the gradients for w1, w2, and w using TF2.5, it shows w1 and w2 still get None as their gradients.

[None, None, <tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[0.6949118],
       [3.9068654]], dtype=float32)>]

whereas in acxz’s workaround w1 and w2 do have gradients:

[<tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.6949118]], dtype=float32)>, <tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[3.9068654]], dtype=float32)>, <tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[0.6949118],
       [3.9068654]], dtype=float32)>]

My question is, is there any other workarounds that works without explicitly calling gradient tape? I’m using Keras model API and use fit/evaluate/etc to run the model, hence do not have direct access to gradient tapes…

bxshi on Jun 14, 2021

@jvishnuvardhan do you think you can take a quick look at this issue? Seems to be a recurring problem with many people. Just getting some clarification would be nice as well.

Thx.

acxz on Jan 28, 2021

@saxenasaurabh The code above returns None for tensors w1 = tf.Variable([[1.0], [1.0]]) w2 = tf.Variable([[3.0], [3.0]]). Same with scalarsw1 = tf.Variable([[1.0]]) w2 = tf.Variable([[3.0]])

jm-willy on May 6, 2021

I absolutely needed tf.concat to experiment around, so I found this easy workaround:

import tensorflow as tf

x = tf.random.normal((1, 2))
w1 = tf.Variable([[1.0]])
w2 = tf.Variable([[3.0]])

# jm-willy's workaround
w = tf.Variable(tf.concat([w1, w2], 0))  # This. Replacing tf.Variable with tf.constant returns None
y = tf.reduce_sum(x, 1)
with tf.GradientTape() as tape:
    r = tf.matmul(w, x)
    loss = tf.metrics.mse(y, r)
print(w)
print(tape.gradient(loss, w))
print('*' * 100)

# acxz's workaround
with tf.GradientTape() as tape:
    w = tf.concat([w1, w2], 0)
    y = tf.reduce_sum(x, 1)
    r = tf.matmul(w, x)
    loss = tf.metrics.mse(y, r)
print(w)
print(tape.gradient(loss, w))
print('*' * 100)

# Lescurel's Reproducible Example which returns None
w = tf.concat([w1, w2], 0)
y = tf.reduce_sum(x, 1)
with tf.GradientTape() as tape:
    r = tf.matmul(w, x)
    loss = tf.metrics.mse(y, r)
print(w)
print(tape.gradient(loss, w))
print('*' * 100)

output:

<tf.Variable 'Variable:0' shape=(2, 1) dtype=float32, numpy=
array([[1.],
       [3.]], dtype=float32)>
tf.Tensor(
[[-2.203636 ]
 [ 2.7455258]], shape=(2, 1), dtype=float32)
****************************************************************************************************
tf.Tensor(
[[1.]
 [3.]], shape=(2, 1), dtype=float32)
tf.Tensor(
[[-2.203636 ]
 [ 2.7455258]], shape=(2, 1), dtype=float32)
****************************************************************************************************
tf.Tensor(
[[1.]
 [3.]], shape=(2, 1), dtype=float32)
None
****************************************************************************************************

Replacing tf.Variable with tf.constant returns None. Seems like tf.concat returns a non-trainable object and thus gradient can’t flow.

Since tf.concat is a commonly used function, I think the docs should be modified to include a temporal solution until the underlying bug is fixed.

How is the Keras LSTM implementation able to work?. Because it uses tf.keras.concatenateor tf.concat each iteration.

@acxz @Lescurel

jm-willy on May 5, 2021

Please advise if this issue has been resolved. I am using 2.4.0 and having the issue of the tape.gradient() returns None with tf.concat().

Tried to workaround by using tf.Variable which has the same result of tf.concat but still tape.gradient() returns None.

        _Y = tf.Variable(
            initial_value=tf.zeros(shape=tf.shape(self.Y), dtype=TYPE_NN_FLOAT),
            trainable=True
        )
        # tf.concat([_ye, _ys], axis=1)
        _Y[::, 0:1].assign(_ye)
        _Y[::, 1:].assign(_ys)

oonisim on May 4, 2021