cdQA: Predictions for certain paragraphs are inaccurate. What's wrong?
I have been testing cdQA on paragraphs that have been generated from CSV. I convert the structured data into English, then predict answers using BERT.
I’ve described the approach here: https://datascience.stackexchange.com/questions/58186/transform-data-into-english-then-predict-an-answer-using-bert
I combine 2 or 3 sentences into paragraphs, then concatenate multiple paragraphs into one dataframe for cdQA pipeline, then query the dataset but results are often incorrect. An example of a sentence:
According to our website, the Melbourne Convention Centre & South Wharf Precinct
project is located at 1 Convention Centre Pl, South Wharf VIC 3006, Australia. The
Melbourne Convention Centre & South Wharf Precinct project has won three awards. The project started in 2014 and was completed in 2016.
And query
how many awards has the Melbourne Convention Centre project won?
Could this form of English writing be too dissimilar to the corpora and datasets on which BERT was pre-trained and fine-tuned? Can you suggest how I could improve results? Thanks.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 25
finally I found the problem: readme.md says
cdqa_pipeline = QAPipeline(model='bert_qa_vCPU-sklearn.joblib')
should be
cdqa_pipeline = QAPipeline(reader='bert_qa_vCPU-sklearn.joblib')
Hi @andrelmfarias ,
I’ve been trying out the newly introduced retrieve based on BM25 and it seems to be working great! Thank you for your efforts. Will provide more updates…
Yes, I did try changing some of the parameters (below) but did not notice much improvement. lowercase=True, preprocessor=None, tokenizer=None, stop_words=‘english’, token_pattern=r"(?u)\b\w\w+\b", ngram_range=(1, 2), max_df=0.85, min_df=2, vocabulary=None, paragraphs=None, top_n=3, verbose=False):
@edanweis Indeed the steps for this are not obvious, I am really sorry about that. To test one query on one paragraph, please follow the steps below:
I ran it and the answer I got was correct: “three”
If you do not get the same answer with the cdqa pipeline, you have to improve (fine-tune) your retriever, as I explained above.