tensorflow: Losses with Reduction.NONE do not keep the input shape (the result is averaged along the last axis)
System information
- OS Platform and Distribution: Linux Manjaro
- TensorFlow installed from: binary
- TensorFlow version (use command below): 2.0.0-dev20190326 (nightly build)
- Python version: 3.7.2
Describe the current behavior Currently, when a loss object is created with reduction=tf.losses.Reduction.NONE, the result is averaged along the last axis. This happens with all the losses in the tf.losses module.
Describe the expected behavior When a loss object is created with reduction=tf.losses.Reduction.NONE, the output shape should match the input shape (y_true and y_pred shape). This is also reported in the doc: tf.losses.Reduction
Moreover, when a sample_weight parameter (BinaryCrossentropy) is provided with the same shape of y_pred, it throws an exception.
Code to reproduce the issue
Loss function with Reduction.NONE
bce = tf.losses.BinaryCrossentropy(reduction=tf.losses.Reduction.NONE)
y_true = [1., 1., 1., 1.]
y_pred = [1., 1., 0.5, 0.5]
loss = bce(y_true, y_pred)
print('Loss: ', loss.numpy())
CURRENT RESULT: 0.34657347 EXPECTED RESULT: [0., 0., 0.69314694, 0.69314694]
Loss function with Reduction.NONE and sample_weight
bce = tf.losses.BinaryCrossentropy(reduction=tf.losses.Reduction.NONE)
y_true = [1., 1., 1., 1.]
y_pred = [1., 1., 0.5, 0.5]
sample_weight = [1, 1, 2, 1]
loss = bce(y_true, y_pred, sample_weight=sample_weight)
print('Loss: ', loss.numpy())
CURRENT RESULT: Exception EXPECTED RESULT: [-0., -0., 1.3862939, 0.69314694]
Other info / logs Currently, to achieve the expected result, an additional “fake” dimension must be added to y_true and y_pred, as in the following example.
bce = tf.losses.BinaryCrossentropy(reduction=tf.losses.Reduction.NONE)
y_true = [1., 1., 1., 1.]
y_pred = [1., 1., 0.5, 0.5]
sample_weight = [1, 1, 2, 1]
loss = bce(tf.expand_dims(y_true, -1), tf.expand_dims(y_pred, -1), sample_weight=sample_weight)
print('Loss: ', loss.numpy())
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 26 (14 by maintainers)
Commits related to this issue
- BugFix for Losses with reduction None. This is bug fix for issue #27190 — committed to amitsrivastava78/tensorflow by deleted user 5 years ago
- Improve description for `Reduction.NONE`. Improve docs for: tensorflow/tensorflow#27190 Fixes: tensorflow/tensorflow#48743 PiperOrigin-RevId: 371208988 — committed to tensorflow/docs by MarkDaoust 3 years ago
- Improve description for `Reduction.NONE`. Improve docs for: tensorflow/tensorflow#27190 Fixes: tensorflow/tensorflow#48743 PiperOrigin-RevId: 371208988 Change-Id: Id77b2009027caa212acf3eaa677abab779d... — committed to tensorflow/tensorflow by MarkDaoust 3 years ago
@matgad @amitsrivastava78 The loss functions actually do not perform any reduction by default. The expectation is that the input passed to the functions will be atleast 2D. These functions return one loss value per sample as output.
Eg if y_true and y_pred have the shape (3, 3), return value will be of shape (3,). => we get one loss value per sample if y_true and y_pred have the shape (3, 2, 4), return value will be of shape (3, 2) => we get one loss value per sample, per timestep etc.
Sorry that the functions do not have any documentation about this currently. We are working on adding that, please feel free to contribute to the documentation if you are interested.
I just replied on the PR with this as well.
Why would issue be closed before updating the documentation? I just ran into the problem because documentation specifically said reduction=Reduction.NONE would give me same shape as I send in.
Hi @amitsrivastava78, thanks for the answer. The behavior that you described is the default one. But when you define a loss with the optional parameter
reduction=tf.losses.Reduction.NONE, the last reduce_mean (or any other reduce) should be skipped. This issue only applies with None reduction.BinaryCrossentropy(reduction=tf.losses.Reduction.NONE)is just an example, the same behavior should be extended for any loss function.@jvishnuvardhan I see that but it still doesn’t make sense to me. I discovered the tf.nn.sigmoid_cross_entropy by accident and it behaves exactly as one would expect when there is no reduction (i.e. element-wise loss scalars). @pavithrasv explanation doesn’t makes sense to me and the documentation doesn’t either. It’s is weird to me that no reduction actually leads to a loss in tensor dimension…I just cannot wrap my head around on what “sample” means in @pavithrasv explanation or the official documentation.
If I am comparing two 3x3 tensors with no reduction I have no idea why I am not getting a 3x3 loss tensor out. I really may be just misunderstanding what “sample” means here. It almost seems like by sample you mean row in y_pred and y_true tensors. Its as if you actually do compute an element-wise loss and once this is done you take the mean along the -1 dimension. But the fact that a mean (or sum) is taken means there is a reduction by definition. 😦
I really dislike this inflexibility. It would be nice if reduction=‘none’ actually meant no reduction in tensor dimension. This behavior means of all the useful losses that tf.losses already provides would need to be re-implemented or do the adding an extra dimension workaround that @matgad pointed out. Again reduction equals none should not mean the software takes an average for me. I’ve updated and confirmed my example above:
No reduction should mean no reduction. Even if the documentation points out the behavior it is still incredibly inflexible and frustrating.
@matgad , i have raised the PR and included your cases as unit test cases as well, Lets wait for this PR to merge, in the mean time you can use this PR in case you are blocked.
#27784
Regards Amit
@matgad ,i see your point, i think this is a BUG, irrespective of the reduction, the tensorflow is applying the reduce_mean to everything, in the coming days i will try to raise one PR to fix this issue.
Regards Amit