sacrebleu: Sacrebleu 2.0 dies when encountering an empty reference line

Sacrebleu 2.0.0 dies if there is an empty line in the reference. You can reproduce this with this command:

 $ sacrebleu -t mtedx/test -l el-en --echo ref | sacrebleu -t mtedx/test -l el-en
Traceback (most recent call last):
  File "/home/mattpost/local/bin/sacrebleu", line 8, in <module>
    sys.exit(main())
  File "/home/mattpost/local/lib/python3.8/site-packages/sacrebleu/sacrebleu.py", line 482, in main
    metrics[name] = METRICS[name](**metric_args)
  File "/home/mattpost/local/lib/python3.8/site-packages/sacrebleu/metrics/bleu.py", line 197, in __init__
    self._ref_cache = self._cache_references(references)
  File "/home/mattpost/local/lib/python3.8/site-packages/sacrebleu/metrics/base.py", line 333, in _cache_references
    raise RuntimeError("Empty or `None` reference sentence found.")
RuntimeError: Empty or `None` reference sentence found.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (6 by maintainers)

Commits related to this issue

Most upvoted comments

  • Removing the None check breaks the variable number of references support through the API.
  • Removing the empty string check breaks the variable number of references support through the CLI.

I think we should at least keep the None check to still support the 1st one. For the second, there’s no easy fix so we’ll need to update the README and say that 2.1.0 breaks compatibility with 2.0.0 and reverts the variable number of refs support through the CLI and insist that this feature should only be used through the API.

I would like to fix it this week and do a 2.1.0 release (with #118 and #174).

I think the reality is that we need to support empty lines. My reasoning is that sacrebleu is a wrapper around existing, externally-provided test sets, and has no power over the correctness of those test sets, nor do its maintainers (i.e., us) have the bandwidth to even determine where a blank line in a reference set belongs in @martinpopel’s taxonomy.

@ozancaglayan what do you think?

Just a note: I think that this check was introduced so that no one is using empty references by mistake. “Simulating” variable number of references by using empty references could change the brevity penalty.

Honestly, I can’t wrap my head around for this problem 😃 Let me think loudly.

Initially, the '' check wasn’t there and it was only filtering out the Nones but if I understand correctly you proposed to add that bit as well ( See https://github.com/mjpost/sacrebleu/pull/132/commits/2b5052a211539570bbf7e71f618affecbad78d1f#r556098399)

That made sense because explicit Nones could only appear if the user uses the API and injects them deliberately to their list of refs. This would mean that the CLI would not work with variable number of references. We allow multiple references through CLI in two ways (i) with a list of files and (ii) with a tab-separated stream. For (i) you can give 100 ref files and some of their lines can be missing to allow var. number of refs. For (ii), essentially the same thing: some columns would be empty strings. So still no Nones at all to check and convert to ''.

BTW, I am not sure why we throw that exception. It was again introduced in #132 probably for some sanity check @tuetschek ? I can’t guess out of my head what happens if we remove that actually.

In the end, all these should be very well documented and tested to not cause ambiguities and weird results in the future.