pyterrier: Various errors when a previous pipeline component returns an empty dataframe

Describe the bug I have created an index with very specific tweet texts and then a simple pipeline along the lines of
pipeline = (tuned_bm25 >> pt.text.get_text(index_ref, ["retweet_score"]) >> pt.apply.doc_score(pipeline_scorer)) to perform reranking. I then searched using pipeline.search() for words which I know doesn’t exist in the index ( in my case it was macaw and thesaurus). It leads to the following error

  File "/mnt/c/Users/prith/Desktop/SI650/ice/si_650_final_project/search.py", line 93, in search
    search_result = (pipeline % int(count)).search(term)
  File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/transformer.py", line 168, in search
    rtr = self.transform(queryDf)
  File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/ops.py", line 189, in transform
    res = self.transformer.transform(topics_and_res)
  File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/ops.py", line 331, in transform
    topics = m.transform(topics)
  File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/ops.py", line 331, in transform
    topics = m.transform(topics)
  File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/apply_base.py", line 199, in transform
    return fn(inputRes)
  File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/text.py", line 97, in add_text_generic
    return add_text_function_docids(res)
  File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/text.py", line 90, in add_text_function_docids
    res[k] = allmeta[i]
IndexError: index 0 is out of bounds for axis 0 with size 0

To Reproduce Steps to reproduce the behavior:

Any index which does not span all of known vocabulary
pipeline = (tuned_bm25 >> pt.text.get_text(index_ref, ["retweet_score"]) >> pt.apply.doc_score(pipeline_scorer))
Use search for a specific term which doesn’t exist in the index vocabulary

Expected behavior There should be no-error with an empty dataframe as output

Documentation and Issues

I have checked the PyTerrier documentation for relevant content
I have checked for previous relevant PyTerrier issues

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 15 (9 by maintainers)

Commits related to this issue

addressed #352 empty df problem — committed to terrier-org/pyterrier by cmacdonald 2 years ago
Various empty dataframe fixes (#353) addresses #352 * addressed #352 empty df problem * fixes for apply.doc_score too * funion fixes and results * addresses pt.text.scorer() on empty df *... — committed to terrier-org/pyterrier by cmacdonald 2 years ago

Most upvoted comments

Good spot, thank you. We’ve seen a few of these empty DF bugs. It should be easy to tidy this up for the next point release

cmacdonald on Dec 10, 2022