label-studio-ml-backend: get_result_from_job_id AssertionError while initializing redeployed LS/ML NER backend

I am trying to deploy NER example model trained on my local machine along with the Label Studio project to another machine. I’ve gone through following steps:

Recreated Label Studio and ML Backend environments similarly as on a target machine
Copied folder with the model itself (folder named with just integers) to target machine ML Backend folder.
Extracted content (data, annotations and predictions) of the project through Label Studio API into json format (using ...export?exportType=JSON&download_all_tasks=true command)
Imported project json file into the newly created Label Studio project. When trying to initialize and pair LS and ML Backend on a new machine, i am getting : [2022-05-30 10:18:56,133] [ERROR] [label_studio_ml.model::get_result_from_last_job::128] 1647350146 job returns exception: Traceback (most recent call last): File "/Users/user/Projects/label-studio-ml-backend/label_studio_ml/model.py", line 126, in get_result_from_last_job result = self.get_result_from_job_id(job_id) File "/Users/user/Projects/label-studio-ml-backend/label_studio_ml/model.py", line 108, in get_result_from_job_id assert isinstance(result, dict) AssertionError and it keeps repeating for each job

Should any additional steps be performed during deploy of the project/model to other environments ?

I’ve tried with following LS versions (1.1.1 - my initial one, 1.4.1post1 - most recent one) and the most current code base of ML backend. Using Python 3.8 and MacOS for both source and target environments.

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 23 (10 by maintainers)

Most upvoted comments

Hi @TrueWodzu

So if I have training turned off, then there should be no exception about result_file?

Yes, but intention for this error is to get message in case of anybody tried to load model that wasn’t trained successfully. I will add this flag so anybody can ignore such errors in future.

KonstantinKorotaev on Mar 21, 2023

Hello, I have the same problem:

I’ve created a custom backend based on example in mmdetection.py. I don’t use active learning (I think, i did not set that up). Every time I will switch to next image for annotation, I am getting this output in console:

[2023-03-17 09:30:05,107] [ERROR] [label_studio_ml.model::get_result_from_last_job::132] 1679041803 job returns exception: Job 1679041803 was finished unsuccessfully. No result was saved in job folder.Please clean up failed job folders to remove this error from log.
Traceback (most recent call last):
  File "d:\coding\developement\python\label-studio-ml-backend\label_studio_ml\model.py", line 130, in get_result_from_last_job
    result = self.get_result_from_job_id(job_id)
  File "d:\coding\developement\python\label-studio-ml-backend\label_studio_ml\model.py", line 111, in get_result_from_job_id
    assert isinstance(result, dict), f"Job {job_id} was finished unsuccessfully. No result was saved in job folder." \
AssertionError: Job 1679041803 was finished unsuccessfully. No result was saved in job folder.Please clean up failed job folders to remove this error from log.

The exception is caused because _get_result_from_job_id is returning None because os.path.exists(result_file) is returning False:

    def _get_result_from_job_id(self, job_id):
        """
        Return job result or {}
        @param job_id: Job id (also known as model version)
        @return: dict
        """
        job_dir = self._job_dir(job_id)
        if not os.path.exists(job_dir):
            logger.warning(f"=> Warning: {job_id} dir doesn't exist. "
                           f"It seems that you don't have specified model dir.")
            return None
        result_file = os.path.join(job_dir, self.JOB_RESULT)
        if not os.path.exists(result_file): <--- THIS CHECK RESULTS IN FLSE
            logger.warning(f"=> Warning: {job_id} dir doesn't contain result file. "
                           f"It seems that previous training session ended with error.")
            return None
        logger.debug(f'Read result from {result_file}')
        with open(result_file) as f:
            result = json.load(f)
        return result

What is strange for me is that I do have job_result.json file in the required directory but probably it is not there when the check occurs? It must be created later. The contents of file is empty json.

and here is my predict() method:

    def predict(self, tasks, **kwargs):
        assert len(tasks) == 1
        task = tasks[0]

        image_url = self._get_image_url(task)
        image_path = self.get_local_path(image_url)
        image = cv2.imread(image_path)
        output = Inference.infer_from_image(self.model, image)
        indices, boxes, confidences = Inference.filter_outputs(self.config, image, output)

        results = []
        all_scores = []
        for i in indices:
            score = confidences[i]
            # print(f'{confidences[i]:.2f}')
            x, y, width, height = self.convert_to_ls(boxes[i][0], boxes[i][1], boxes[i][2], boxes[i][3], image.shape[1], image.shape[0])
            results.append({
                'from_name': self.from_name,
                'to_name': self.to_name,
                'type': 'rectanglelabels',
                'value': {
                    'rectanglelabels': ['head'],
                    'x': x,
                    'y': y,
                    'width': width,
                    'height': height
                },
                'score': float(score)
            })
            all_scores.append(score)
            avg_score = sum(all_scores) / max(len(all_scores), 1)
        return [{
            'result': results,
            'score': float(avg_score)
        }]

But I dont think this is due to predict() as I’ve said earlier, the check: if not os.path.exists(result_file): is failing for some reason.

TrueWodzu on Mar 17, 2023

I run into the exact same problem with my custom backend.

I am in the process of upgrading my system to the latest LS and backend. Everything was working fine with LS 1.1.1 and the backend from a year ago.

After training, another job is sent for some reason, and then train_output is being cleared causing the backend to lose the knowledge about the last trained model.

I already set LABEL_STUDIO_ML_BACKEND_V2_DEFAULT = True

 'train_output': {'model_path': '././my_backend/5.1655881660/1658156596'},
 'value': 'image'}
[2022-07-18 17:03:37,760] [INFO] [werkzeug::_log::225] 192.168.123.133 - - [18/Jul/2022 17:03:37] "POST /train HTTP/1.1" 201 -
[2022-07-18 17:03:37,781] [INFO] [werkzeug::_log::225] 192.168.123.133 - - [18/Jul/2022 17:03:37] "GET /health HTTP/1.1" 200 -
[2022-07-18 17:03:37,787] [ERROR] [label_studio_ml.model::get_result_from_last_job::130] 1658156583 job returns exception: 
Traceback (most recent call last):
  File "/home/USER/.virtualenvs/ls-1.5/lib/python3.8/site-packages/label_studio_ml/model.py", line 128, in get_result_from_last_job
    result = self.get_result_from_job_id(job_id)
  File "/home/USER/.virtualenvs/ls-1.5/lib/python3.8/site-packages/label_studio_ml/model.py", line 110, in get_result_from_job_id
    assert isinstance(result, dict)
AssertionError

jrdalenberg on Jul 22, 2022