addons: seq2seq.SequenceLoss is probably incompatible with tf.keras.models.Model.compile

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google Colab
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.0.0.b1
  • TensorFlow Addons installed from (source, PyPi): PyPi
  • TensorFlow Addons version: 0.4.0
  • Python version and type (eg. Anaconda Python, Stock Python as in Mac, or homebrew installed Python etc): Stock Python
  • Bazel version (if compiling from source): -
  • GCC/Compiler version (if compiling from source): -
  • Is the GPU used? (yes/no): Yes

Describe the bug

I think there are two incompatibilities when we use tfa.seq2seq.SequenceLoss and tf.keras.Model at the same time. In the following example, I replicated a common scenario in building a Seq2seq model. i. e. The input is a 2D int32 tensor containing word ids, and the output/logit is a 3D tensor which represents distributions over our vocabulary in each time step.

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, 50, mask_zero=True),
    tf.keras.layers.Dense(vocab_size, activation='softmax')
])
model.summary()

Now, Let’s compile the model and use tfa.seq2seq.SequenceLoss() as its loss function:

import tensorflow_addons as tfa
loss_obj = tfa.seq2seq.SequenceLoss()
model.compile(optimizer='adam', loss=loss_obj)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape, allow_broadcast)
    453   else:
    454     if values is None:
--> 455       raise ValueError("None values not supported.")
    456     # if dtype is provided, forces numpy array to be the type
    457     # provided if possible.

ValueError: None values not supported.

As you can see, it gives us an ambiguous error (complete traceback none_value.log). However, I suppose that this issue is rooted in the custom __call__ implementation. We had to override the super class __call__ method to handle the loss reduction by ourselves, There is nothing wrong with this approach and indeed it is supported by the Keras APIs. But Keras checks this in a weird way. I guess training.py#L1673 is the corresponding condition. it decides to call our loss function by .call() or .__call__() based on having reductionattribute. Since SequenceLoss is inherited from tf.keras.loss.Loss class, the reduction attribute is automatically (check the code at losses.py#L91) added to our class. Thus it will lead to a call to its .call() method where we basically return None. Then I tried to fix this issue temperarly by deleting this attribute:

loss_obj = tfa.seq2seq.SequenceLoss()
delattr(loss_obj, 'reduction')
model.compile(optimizer='adam', loss=loss_obj)

It successfully passes that step, But it raises another exception (complete traceback shape_mismatch.log):

/usr/local/lib/python3.6/dist-packages/tensorflow_addons/seq2seq/loss.py in sequence_loss(logits, targets, weights, average_across_timesteps, average_across_batch, sum_over_timesteps, sum_over_batch, softmax_loss_function, name)
     92     if len(targets.get_shape()) != 2:
     93         raise ValueError(
---> 94             "Targets must be a [batch_size x sequence_length] tensor")
     95     if len(weights.get_shape()) != 2:
     96         raise ValueError(

ValueError: Targets must be a [batch_size x sequence_length] tensor

I followed the traceback, and it turns out that Keras does some loss preparation stuff in model.compile() method in which it gives our loss an empty 3D tenser with the same shape as our output is regardless of actual target label. And 3D tensors for target is apparently unsupported.

p.s: I encountered this issue when I was working on #231.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

Ah, thanks for the reference to the training loop. Then I think removing the “reduction” should be the way to go.

Hi, Yes actually I’m currently working on it, I think I’ll be able to send the pull request today. Thank you for your patience. On Shahrivar 18, 1398 AP, 1:19 PM +0430, Guillaume Klein notifications@github.com, wrote:

Hi @kazemnejad, did you have a chance to try that? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.