tensorflow: tf.keras computes incorrect loss values with Masking

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • TensorFlow installed from (source or binary): pip install tensorflow / pip install tf-nightly-2.0-preview
  • TensorFlow version (use command below): 2.0 / 2.0.0-dev20191002
  • Python version: 3.7.5

Describe the current behavior

When fitting an LSTM model with a Masking layer using Keras, if you use binary_crossentropy both as loss and metric, the values are different. In particular, the printed metric is correct, while the loss appears to be calculated without masking.

Code to reproduce the issue

import numpy as np
from numpy.random import normal, randint
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Masking, LSTM, Activation, Dense

time_steps = 20
num_seqs = 100
X = normal(size=(num_seqs, time_steps))  # create artificial data
Y = np.where(X > 0, 1, 0)  # create simple target

lens = randint(low=1, high=time_steps, size=num_seqs)  # create lengths < time_steps (padding needed)
seqs = [row[:row_len] for row, row_len in zip(X, lens)]  # artificially cut sequences
target_seqs = [row[:row_len] for row, row_len in zip(Y, lens)]  # artificially cut target sequences

padded_seqs = pad_sequences(seqs, padding='post', dtype='float32', maxlen=time_steps).reshape(num_seqs, time_steps, -1)
padded_targets = pad_sequences(target_seqs, dtype='float32', padding='post', maxlen=time_steps).reshape(num_seqs, time_steps, -1)

LSTM_SIZE = 16
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(time_steps,1)))
model.add(LSTM(LSTM_SIZE, return_sequences=True))
model.add(Dense(1))
model.add(Activation("sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["binary_crossentropy"])
print(model.summary())

model.fit(padded_seqs, padded_targets, batch_size=1024, epochs=10)

and observe that the loss and metric values are different:

Train on 100 samples
Epoch 1/10
100/100 [==============================] - 4s 38ms/sample - loss: 0.3365 - binary_crossentropy: 0.6434
Epoch 2/10
100/100 [==============================] - 0s 253us/sample - loss: 0.3361 - binary_crossentropy: 0.6427
Epoch 3/10
100/100 [==============================] - 0s 260us/sample - loss: 0.3357 - binary_crossentropy: 0.6419
Epoch 4/10
100/100 [==============================] - 0s 271us/sample - loss: 0.3353 - binary_crossentropy: 0.6412
Epoch 5/10
100/100 [==============================] - 0s 266us/sample - loss: 0.3349 - binary_crossentropy: 0.6404
Epoch 6/10
100/100 [==============================] - 0s 250us/sample - loss: 0.3345 - binary_crossentropy: 0.6396
Epoch 7/10
100/100 [==============================] - 0s 275us/sample - loss: 0.3341 - binary_crossentropy: 0.6389
Epoch 8/10
100/100 [==============================] - 0s 247us/sample - loss: 0.3337 - binary_crossentropy: 0.6381
Epoch 9/10
100/100 [==============================] - 0s 196us/sample - loss: 0.3333 - binary_crossentropy: 0.6373
Epoch 10/10
100/100 [==============================] - 0s 217us/sample - loss: 0.3329 - binary_crossentropy: 0.6366

Other info / logs

I already saw this closed issue: https://github.com/tensorflow/tensorflow/issues/25970 and the linked https://stackoverflow.com/questions/54802328/why-am-i-getting-different-values-between-loss-functions-and-metrics-in-tensorfl That problem appears to be solved in my version, but my problem persists.

I also saw this stackoverflow post: https://stackoverflow.com/questions/53808163/same-function-in-keras-loss-and-metric-give-different-values-even-without-regula

The solution doesn’t work for me. I tried changing the import statements to

from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Masking, LSTM, Activation, Dense

but the problem remains.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 17 (6 by maintainers)

Most upvoted comments

Please do not close this. This is a very important issue to me.

This is actually a symptom of a larger problem: the loss values are incorrect when sample weights are used because the total weighted loss is divided by the number of elements in the tensor (not divided by the total sample weight). When masking is applied, it appears that the sample weights are computed correctly according to the mask, it’s just that those weights are not correctly used in the loss calculation.

For more details, see https://github.com/tensorflow/tensorflow/issues/34158#issuecomment-756813250 which includes compact examples of the issue.