haystack: FARMReader slow
Question I am running one of the samples in K8 pod (gpu) It get stuck in FARMReader for long (30+ mins) and time out. Any reason? All i added was 2 .txt document
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2",
use_gpu=True, return_no_answer=True, no_ans_boost=0.7, context_window_size=200)
retriever = ElasticsearchRetriever(document_store= document_store)
pipe = ExtractiveQAPipeline(reader, retriever)
# predict n answers
prediction = pipe.run(query=question, top_k_retriever=10, top_k_reader=3)
y[```
2021-05-19 23:34:10 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:8)
05/19/2021 23:34:10 - INFO - farm.infer - Got ya 23 parallel workers to do inference …
05/19/2021 23:34:10 - INFO - farm.infer - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
05/19/2021 23:34:10 - INFO - farm.infer - /w\ /w\ /w\ /w\ /w\ /w\ /w\ /|\ /w\ /w\ /w\ /w\ /w\ /w\ /|\ /w\ /|\ /|\ /|\ /|\ /w\ /w\ /|
05/19/2021 23:34:10 - INFO - farm.infer - /‘\ / \ /’\ /‘\ / \ / \ /’\ /‘\ /’\ /‘\ /’\ /‘\ / \ /’\ /‘\ / \ /’\ /‘\ /’\ /‘\ / \ / \ /’
05/19/2021 23:34:10 - INFO - farm.infer -
05/19/2021 23:34:10 - INFO - elasticsearch - POST http://10.x.x.x:8071/sidx/_search [status:200 request:0.003s]
05/19/2021 23:34:10 - WARNING - farm.data_handler.dataset - Could not determine type for feature ‘labels’. Converting now to a tensor of default type long.
05/19/2021 23:34:10 - WARNING - farm.data_handler.dataset - Could not determine type for feature ‘labels’. Converting now to a tensor of default type long.
[2021-05-19 23:34:40 +0000] [8] [WARNING] Worker graceful timeout (pid:8)
[2021-05-19 23:34:42 +0000] [8] [INFO] Worker exiting (pid: 8)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 31 (16 by maintainers)
Hey, @tholor loving the FARMReader interface. However, for a single prediction, I’m seeing FARMReader being ~6x slower than both TransformersReader and Huggingface QA pipeline with num_processes=0 or 1, and ~7.5x slower with num_processes=None. Is there something obvious I’m missing here? Should we expect inference time parity?
Using the latest for farm-haystack and transformers. Pytorch==1.12.1 Colab notebook: https://colab.research.google.com/drive/1DmbqWaFw9U4NLzn2dI_u1ypGScKdrGqp?usp=sharing