pyterrier: Various errors when a previous pipeline component returns an empty dataframe
Describe the bug
I have created an index with very specific tweet texts and then a simple pipeline along the lines of
pipeline = (tuned_bm25 >> pt.text.get_text(index_ref, ["retweet_score"]) >> pt.apply.doc_score(pipeline_scorer))
to perform reranking. I then searched using pipeline.search() for words which I know doesn’t exist in the index ( in my case it was macaw and thesaurus). It leads to the following error
File "/mnt/c/Users/prith/Desktop/SI650/ice/si_650_final_project/search.py", line 93, in search
search_result = (pipeline % int(count)).search(term)
File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/transformer.py", line 168, in search
rtr = self.transform(queryDf)
File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/ops.py", line 189, in transform
res = self.transformer.transform(topics_and_res)
File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/ops.py", line 331, in transform
topics = m.transform(topics)
File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/ops.py", line 331, in transform
topics = m.transform(topics)
File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/apply_base.py", line 199, in transform
return fn(inputRes)
File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/text.py", line 97, in add_text_generic
return add_text_function_docids(res)
File "/home/prithvid/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyterrier/text.py", line 90, in add_text_function_docids
res[k] = allmeta[i]
IndexError: index 0 is out of bounds for axis 0 with size 0
To Reproduce Steps to reproduce the behavior:
- Any index which does not span all of known vocabulary
pipeline = (tuned_bm25 >> pt.text.get_text(index_ref, ["retweet_score"]) >> pt.apply.doc_score(pipeline_scorer))- Use
searchfor a specific term which doesn’t exist in the index vocabulary
Expected behavior There should be no-error with an empty dataframe as output
Documentation and Issues
- I have checked the PyTerrier documentation for relevant content
- I have checked for previous relevant PyTerrier issues
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (9 by maintainers)
Commits related to this issue
- addressed #352 empty df problem — committed to terrier-org/pyterrier by cmacdonald 2 years ago
- Various empty dataframe fixes (#353) addresses #352 * addressed #352 empty df problem * fixes for apply.doc_score too * funion fixes and results * addresses pt.text.scorer() on empty df *... — committed to terrier-org/pyterrier by cmacdonald 2 years ago
Good spot, thank you. We’ve seen a few of these empty DF bugs. It should be easy to tidy this up for the next point release