haystack: Unable load a local French DPR model (etalab-ia/dpr-question_encoder-fr_qa-camembert)

Describe the bug I tried to save the french DPR model (retriever.save), that’s working but i have an error after when i try to load it (retriever.load).

The exact same code work fine with the facebook/dpr-question_encoder-single-nq-base model.

I want to do that to fine-tune the french DPR model.

retriever = DensePassageRetriever(document_store=InMemoryDocumentStore(),
                                          query_embedding_model="etalab-ia/dpr-question_encoder-fr_qa-camembert",
                                          passage_embedding_model="etalab-ia/dpr-ctx_encoder-fr_qa-camembert",
                                          max_seq_len_query=64,
                                          max_seq_len_passage=256,
                                          batch_size=50,
                                          use_gpu=True,
                                          embed_title=False,
                                          use_fast_tokenizers=True,
                                          infer_tokenizer_classes=True,
                                          )

retriever.save("saved_models/test")
retriever = DensePassageRetriever.load("saved_models/test", document_store=None)

ERROR: TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (10 by maintainers)

Commits related to this issue

Most upvoted comments

After upgrading FARM to 0.8.0 in Haystack, current Haystack master solves this problem.

I confirm, everything is working perfectly.

Thank you very much for all the work you have done.

Hi @AymericSallet Yes 👍 It is working in Haystack if you use the new FARM release 0.8.0 and Haystack branch dpr_without_bert_tokenizer. If you try it out, please let me know if it works for you! It is working on my side. Please make sure to set infer_tokenizer_classes=Truewhen you initialize DensePassageRetriever with etalab-ia/dpr-question_encoder-fr_qa-camembert. We will merge the branch into Haystack master as soon as we upgrade the FARM version in Haystack too.

Hi @julian-risch ! Thanks ! Yes indeed we have a typo there. I will fix it asap.

On the other hand, I believe that this is not the whole issue, because we should never get to that line with this particular DPR model and config file. In theory, given that the model config specifies camembert as model_type we should stop at line 167 or even before, at line 150, in the same script (tokenization.py).

I confess that I have never tried saving and then loading the model. I guess the problem may be that when we save it, we no longer have camembert as model_type in the configuration, but maybe dpr or other type. This difference changes the whole loading procedure of the model.

Edit: I was thinking on the loading of the hugginface model (etalab-ia/…). Indeed, reloading a locally saved model may not be working because of this typo!

Adding @psorianom who trained this model to this conversation