FARM: Cannot load model from local dir

Describe the bug I want to do this with Haystack:

### Inference ############

# Load model
reader = FARMReader(model_name_or_path="../../saved_models/twmkn9/albert-base-v2-squad2", use_gpu=False)

I finetuned the model before and saved it to my local dir. Here the code:

### TRAINING #############
# Let's take a reader as a base model
reader = FARMReader(model_name_or_path="twmkn9/albert-base-v2-squad2", max_seq_len=512, use_gpu=False)

# and fine-tune it on your own custom dataset (should be in SQuAD like format)
train_data = "training_data"
reader.train(data_dir=train_data, train_filename="2020-02-23_answers.json", test_file_name='TEST_answers.json', use_gpu=False, n_epochs=1, dev_split=0.1)

Error message

03/28/2020 22:25:07 - INFO - farm.utils -   device: cpu n_gpu: 0, distributed training: False, automatic mixed precision training: None
03/28/2020 22:25:07 - INFO - farm.modeling.adaptive_model -   Found files for loading 1 prediction heads
03/28/2020 22:25:07 - WARNING - farm.modeling.prediction_head -   Some unused parameters are passed to the QuestionAnsweringHead. Might not be a problem. Params: {"training": true, "num_labels": 2, "ph_output_type": "per_token_squad", "model_type": "span_classification", "name": "QuestionAnsweringHead"}
03/28/2020 22:25:07 - INFO - farm.modeling.prediction_head -   Prediction head initialized with size [768, 2]
03/28/2020 22:25:07 - INFO - farm.modeling.prediction_head -   Loading prediction head from ../../saved_models/twmkn9/albert-base-v2-squad2/prediction_head_0.bin
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/Documents/CodingProjects/NLPofTimFerrissShow/QnA_with_Tim_Haystack.py in 
      51 
      52 # Load model
----> 53 reader = FARMReader(model_name_or_path="../../saved_models/twmkn9/albert-base-v2-squad2", use_gpu=False)
      54 # A retriever identifies the k most promising chunks of text that might contain the answer for our question
      55 # Retrievers use some simple but fast algorithm, here: TF-IDF

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/haystack/reader/farm.py in __init__(self, model_name_or_path, context_window_size, batch_size, use_gpu, no_ans_boost, top_k_per_candidate, top_k_per_sample, max_processes, max_seq_len, doc_stride)
     79         self.inferencer = Inferencer.load(model_name_or_path, batch_size=batch_size, gpu=use_gpu,
     80                                           task_type="question_answering", max_seq_len=max_seq_len,
---> 81                                           doc_stride=doc_stride)
     82         self.inferencer.model.prediction_heads[0].context_window_size = context_window_size
     83         self.inferencer.model.prediction_heads[0].no_ans_boost = no_ans_boost

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/farm/infer.py in load(cls, model_name_or_path, batch_size, gpu, task_type, return_class_probs, strict, max_seq_len, doc_stride)
    139                 processor = InferenceProcessor.load_from_dir(model_name_or_path)
    140             else:
--> 141                 processor = Processor.load_from_dir(model_name_or_path)
    142 
    143         # b) or from remote transformers model hub

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/farm/data_handler/processor.py in load_from_dir(cls, load_dir)
    189         del config["tokenizer"]
    190 
--> 191         processor = cls.load(tokenizer=tokenizer, processor_name=config["processor"], **config)
    192 
    193         for task_name, task in config["tasks"].items():

TypeError: load() missing 1 required positional argument: 'data_dir'

Expected behavior There is no error.

Additional context I use Haystack

To Reproduce Steps to reproduce the behavior

System:

  • OS: Mac OS 10.14.6 (mojave)
  • GPU/CPU: CPU Intel core i5
  • FARM version: 0.4.1

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 23 (10 by maintainers)

Most upvoted comments

Hey @ahotrod , Right now we only support two options to load a FARMReader: a) Local FARM model b) Remote Transformers model

I guess your error comes up when trying to load a local model in Transformers format? I think it makes totally sense to support this and we will put it in the backlog. However, it might take some days as we are currently quite busy with a few other features.

The only way I see is to load your model into FARM, then save it as a FARMmodel, then load it in haystack as a FARMReader.

You should be able to do all this to your local transformers model with (Please adjust the parameters accordingly):

    model = AdaptiveModel.convert_from_transformers(model_name_or_path, device=device, task_type="question_answering")

    tokenizer = Tokenizer.load(pretrained_model_name_or_path=model_name_or_path,do_lower_case=do_lower_case)
    processor = SquadProcessor(
        tokenizer=tokenizer,
        max_seq_len=256,
        label_list= ["start_token", "end_token"],
        metric="squad",
        train_filename=None,
        dev_filename=None,
        dev_split=0,
        test_filename=evaluation_filename,
        data_dir=data_dir,
        doc_stride=128,
    )
    model.connect_heads_with_processor(data_silo.processor.tasks, require_labels=True)
    
    model.save(save_dir)
    processor.save(save_dir)