CAT: I has implement Tensorflow binding, but gradient maye error.

I have modifies warp-ctc Tensorflow binding and CAT pytorch binding, besides I have removed costs_beta (maybe useless).

def ctc_crf_loss(logits, labels, input_lengths,
                 blank_label=0, lamb=0.1):
  '''Computes the CTC-CRF loss between a sequence of logits and a
  ground truth labeling.

  Args:
      logits: A 3-D Tensor of floats. The dimensions
                   should be (t, n, a), where t is the time index, n
                   is the minibatch index, and a indexes over
                   logits for each symbol in the alphabet.

      labels: An int32 SparseTensor. labels.indices[i, :] == [b, t] means 
              labels.values[i] stores the id for (batch b, time t). 
              labels.values[i] must take on values in [0, num_labels).

      input_lengths: A 1-D Tensor of ints, the number of time steps
                     for each sequence in the minibatch.

      blank_label: int, the label value/index that the CTC
                   calculation should use as the blank label.

      lamb: float, A weight α for CTC Loss. 
                  Combined with the CRF loss to help convergence.

  Returns:
      1-D float Tensor, the cost of each example in the minibatch
      (as negative log probabilities).

  * This class performs the softmax operation internally.

  * The label reserved for the blank symbol should be label 0.

  '''
  # The input of the warp-ctc is modified to be the log-softmax output of the bottom neural network.
  activations = tf.nn.log_softmax(logits) # (t, n, a)
  activations_ = tf.transpose(activations, (1, 0, 2)) # (n, t, a)
  loss, _, _, costs_alpha = _ctc_crf.ctc_crf_loss(
      activations, activations_, labels.indices, labels.values,
      input_lengths, blank_label, lamb) # costs, gradients, grad_net, costs_alpha

  return (costs_alpha - (1 + lamb) * loss)  # (n,)


@ops.RegisterGradient("CtcCrfLoss")
def _CTCLossGrad(op, grad_loss, a, b, c):
  """The derivative provided by CTC-CRF Loss.

  Args:
     op: the CtcCrfLoss op.
     grad_loss: The backprop for cost.

  Returns:
     The CTC-CRF Loss gradient.
  """
  lamb = op.get_attr('lamb')
  grad_ctc = op.outputs[1] # (t, n, a)
  grad_den = tf.transpose(op.outputs[2], (1, 0, 2)) # (t, n, a)
  grad = grad_den - (1 + lamb) * grad_ctc # (t, n, a)
  # average with batch size.
  grad /= tf.cast(_get_dim(grad, 1), dtype=tf.float32) # (t, n, a)

  # Return gradient for inputs and None for
  # activations_, labels_indices, labels_values and sequence_length.
  return [_BroadcastMul(grad_loss, grad), None, None, None, None]
  # return [_BroadcastMul(grad_loss, op.outputs[1]), None, None, None, None]

I can provide all the codes if necessary, but my result is error because TER is over 100%.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16

Most upvoted comments

In my option: CRF_Loss = -(log_prob_ctc-log_prob_den)+lamb*(-log_prob_ctc) = log_ prob_den - (1+lamb)*log_prob_ctc, so the ctc_crf_base.gpu_ctc, ctc_crf_base.gpu_den output is log_prob_ctc and log_prob_den, not the mean of loss. I guess the gradient is -(grad_den - (1 + lamb) * grad_ctc). Please correct me if wrong.

I think the gradient is grad_den - (1 + lamb) * grad_ctc.