tensorflow: Incorrect gradient for ctc_loss on GPU when using logit_length
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian 9.12 (TF2.2 DeepLearning image on GCP)
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
- TensorFlow installed from (source or binary): Preinstalled
- TensorFlow version (use command below): v2.2.0-0-g2b96f36 2.2.0-dlenv
- Python version: 3.7.6
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: V10.1.243
- GPU model and memory: Nvidia tesla P100
Describe the current behavior
I have experienced inconsistencies in the computation of the gradient of tf.nn.ctc_loss between the CPU and GPU implementations when the logit_length argument contains something else than [num_frames]*batch_size.
Mostly I observe that the gradient relative to logits for the GPU implementation does not contain zeros after the end of the sequence as given by logit_length. Whereas this is the case for the CPU implementation which seems to work correctly.
I have noticed that the unit tests for this op do not test this case in particular (see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/kernel_tests/ctc_loss_op_test.py#L993).
Standalone code to reproduce the issue
import tensorflow as tf
use_logits_lengths = True
batch_size = 8
num_labels = 27
max_labels_length = 32
max_logits_length = 128
labels = []
labels_lengths = []
logits = []
logits_lengths = []
for i in range(batch_size):
labels_lengths.append(tf.random.uniform([], 1, max_labels_length, tf.int32))
labels.extend(tf.random.uniform([labels_lengths[-1]], 0, num_labels-1, tf.int32))
# I multiply label_length by 2 to make sure there are enough frames
logits_lengths.append(tf.random.uniform([], labels_lengths[-1].numpy()*2, max_logits_length+1, tf.int32))
labels = tf.RaggedTensor.from_row_lengths(labels, labels_lengths).to_sparse()
labels_lengths = tf.concat(labels_lengths, 0)
logits = tf.random.uniform([batch_size, max_logits_length, num_labels])
logits_lengths = tf.concat(logits_lengths, 0)
logits_lengths_full = tf.constant([max_logits_length]*batch_size)
def ctc_compare_cpu_gpu(logits_lengths):
print("logits_lengths", logits_lengths.numpy())
with tf.device("/gpu:0"):
with tf.GradientTape() as t:
t.watch(logits)
gpu_loss = tf.nn.ctc_loss(labels, logits, labels_lengths, logits_lengths, logits_time_major=False, blank_index=-1)
gpu_grad = t.gradient(gpu_loss, [logits])[0]
with tf.device("/cpu:0"):
with tf.GradientTape() as t:
t.watch(logits)
cpu_loss = tf.nn.ctc_loss(labels, logits, labels_lengths, logits_lengths, logits_time_major=False, blank_index=-1)
cpu_grad = t.gradient(cpu_loss, [logits])[0]
print("Max loss error", tf.math.abs(gpu_loss - cpu_loss).numpy().max())
print("Max grad error", tf.math.abs(gpu_grad - cpu_grad).numpy().max())
print()
return cpu_loss, gpu_loss, cpu_grad, gpu_grad
ctc_compare_cpu_gpu(logits_lengths_full)
ctc_compare_cpu_gpu(logits_lengths)
Output:
logits_lengths [128 128 128 128 128 128 128 128]
Max loss error 0.00012207031
Max grad error 0.00014734268
logits_lengths [ 70 86 22 74 112 121 103 123]
Max loss error 6.1035156e-05
Max grad error 0.9669469
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 21 (11 by maintainers)
Was able to replicate the issue in TF v2.5,please find the gist here…Thanks !
I tried the script on V100 a couple of times and I can see the flakiness: Run 1:
Run X:
Looking into it.