espnet: Bug: score_bleu.sh removes good tokens

Hello,

While reproducing an experiment on MuST-C, I realized that the reference file used for computing BLEU is different from the original reference file: tokens such as (_Gelächter_) and (_Applaus_) are removed from both the reference and the hypothesis.

After checking score_bleu.sh, I think that the issue lies at these lines for removing utterance IDs (they seem to remove everything between parentheses).

Could you please check this? The issue seems critical as it may impact results reported in published papers.

Thank you in advance!

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Thanks! I had a conversation with @hirofumi0810, and he would add some options in the script to switch between IWSLT and Must-C standards.

@formiel One reference: https://www.aclweb.org/anthology/2020.acl-main.350.pdf This recent ACL paper on simultaneous ST actually removed such labels in advance.