CAT: I has implement Tensorflow binding, but gradient maye error.
I have modifies warp-ctc
Tensorflow binding and CAT
pytorch binding, besides I have removed costs_beta
(maybe useless).
def ctc_crf_loss(logits, labels, input_lengths,
blank_label=0, lamb=0.1):
'''Computes the CTC-CRF loss between a sequence of logits and a
ground truth labeling.
Args:
logits: A 3-D Tensor of floats. The dimensions
should be (t, n, a), where t is the time index, n
is the minibatch index, and a indexes over
logits for each symbol in the alphabet.
labels: An int32 SparseTensor. labels.indices[i, :] == [b, t] means
labels.values[i] stores the id for (batch b, time t).
labels.values[i] must take on values in [0, num_labels).
input_lengths: A 1-D Tensor of ints, the number of time steps
for each sequence in the minibatch.
blank_label: int, the label value/index that the CTC
calculation should use as the blank label.
lamb: float, A weight α for CTC Loss.
Combined with the CRF loss to help convergence.
Returns:
1-D float Tensor, the cost of each example in the minibatch
(as negative log probabilities).
* This class performs the softmax operation internally.
* The label reserved for the blank symbol should be label 0.
'''
# The input of the warp-ctc is modified to be the log-softmax output of the bottom neural network.
activations = tf.nn.log_softmax(logits) # (t, n, a)
activations_ = tf.transpose(activations, (1, 0, 2)) # (n, t, a)
loss, _, _, costs_alpha = _ctc_crf.ctc_crf_loss(
activations, activations_, labels.indices, labels.values,
input_lengths, blank_label, lamb) # costs, gradients, grad_net, costs_alpha
return (costs_alpha - (1 + lamb) * loss) # (n,)
@ops.RegisterGradient("CtcCrfLoss")
def _CTCLossGrad(op, grad_loss, a, b, c):
"""The derivative provided by CTC-CRF Loss.
Args:
op: the CtcCrfLoss op.
grad_loss: The backprop for cost.
Returns:
The CTC-CRF Loss gradient.
"""
lamb = op.get_attr('lamb')
grad_ctc = op.outputs[1] # (t, n, a)
grad_den = tf.transpose(op.outputs[2], (1, 0, 2)) # (t, n, a)
grad = grad_den - (1 + lamb) * grad_ctc # (t, n, a)
# average with batch size.
grad /= tf.cast(_get_dim(grad, 1), dtype=tf.float32) # (t, n, a)
# Return gradient for inputs and None for
# activations_, labels_indices, labels_values and sequence_length.
return [_BroadcastMul(grad_loss, grad), None, None, None, None]
# return [_BroadcastMul(grad_loss, op.outputs[1]), None, None, None, None]
I can provide all the codes if necessary, but my result is error because TER is over 100%.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16
I think the gradient is
grad_den - (1 + lamb) * grad_ctc.