cvat: Could not get models from the server

Actions before raising this issue

I searched the existing issues and did not find anything similar.
I read/searched the docs

Steps to Reproduce

Hello, Im trying to deploy the YOLOv5 with nucilo to use it with CVAT in order to do auto annotation. I tried to deploy Yolov5, same as the YoloV7 ONNX which already exist in the Cvat documentation. I have managed to successfully deploy the YoloV5 onnx with their original weight file (yolov5l.onnx). However when I try to deploy with my custom weight file, the CVAT or Nuclio cannot find the weight file.

Could not get models from the server

I have checked inside the container and my weight file is there and Im sure is nothing wrong with my onnx file since I did the test outside of docker and works fine. I have also use the docker logs to see if there is any error but no error.

This is my function.yaml

metadata:
  name: pth-ultralytics-yolov5-anfp
  namespace: cvat
  annotations:
    name: YOLO v5
    type: detector
    framework: onnx
    spec: |
      [
        { "id": 0, "name": "Adidas", "type": "rectangle" },
      ]
spec:
  description: YOLO v5
  runtime: 'python:3.8'
  handler: main:handler
  eventTimeout: 30s
  build:
    image: cvat.onnx.ultralytics.yolov5-anfp
    baseImage: ubuntu:20.04
    directives:
      preCopy:
        - kind: USER
          value: root
        - kind: RUN
          value: apt update && apt install --no-install-recommends -y wget python3-pip
        - kind: RUN
          value: apt update && apt install --no-install-recommends -y libglib2.0-0
        - kind: RUN
          value: apt-get update && apt-get install ffmpeg libsm6 libxext6  -y
        - kind: RUN
          value: pip install onnxruntime opencv-python-headless pillow pyyaml torch torchvision numpy onnx onnx-simplifier tqdm scipy gitpython matplotlib      
        - kind: WORKDIR
          value: /opt/nuclio
        - kind: RUN
          value: wget xx.onnx
        - kind: RUN
          value: ln -s /usr/bin/python3 /usr/bin/python

  triggers:
    myHttpTrigger:
      maxWorkers: 2
      kind: 'http'
      workerAvailabilityTimeoutMilliseconds: 10000
      attributes:
        maxRequestBodySize: 33554432 # 32MB

  platform:
    attributes:
      restartPolicy:
        name: always
        maximumRetryCount: 3
      mountMode: volume

Here is my main file:

import base64
import io
import json

import yaml
from model_handler import ModelHandler
from PIL import Image


def init_context(context):
    context.logger.info("Init context...  0%")

    # Read the DL model
    model = ModelHandler()
    context.user_data.model = model

    context.logger.info("Init context...100%")


def handler(context, event):
    context.logger.info("Run YoloV5 ONNX model")
    data = event.body
    buf = io.BytesIO(base64.b64decode(data["image"]))
    image = Image.open(buf)

    results = context.user_data.model.infer(image, 0.5)

    return context.Response(body=json.dumps(results), headers={},
        content_type='application/json', status_code=200)

Here is a part of my model handler


import cv2
import numpy as np
import onnxruntime as ort
import torch
import torchvision
class ModelHandler:
    def __init__(self):
        self.is_inititated = None
        self.names = None
        self.stride = None
        self.meta = None
        self.output_names = None
        self.session = None
        self.load_network(model="xxyolov5l.onnx")

    def load_network(self, model):
        device = ort.get_device()
        cuda = True if device == 'GPU' else False
        try:
            providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']
            so = ort.SessionOptions()
            so.log_severity_level = 3

            self.session = ort.InferenceSession(model, providers=providers, sess_options=so)
            self.output_names = [x.name for x in self.session.get_outputs()]
            self.meta = self.session.get_modelmeta().custom_metadata_map
            # self.input_details = [i.name for i in self.session.get_inputs()]
            if "stride" in self.meta:
                self.stride, self.names = int(self.meta["stride"]), eval(self.meta["names"])
            self.is_inititated = True
        except Exception as e:
            raise Exception(f"HIYOOOO Cannot load model {model}: {e}")
    def letterbox(self, im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleup=True, stride=32):
        shape = im.shape[:2]  # current shape [height, width]
        if isinstance(new_shape, int):
            new_shape = (new_shape, new_shape)

        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
        if not scaleup:  # only scale down, do not scale up (for better val mAP)
            r = min(r, 1.0)

        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
        dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

        if auto:  # minimum rectangle
            dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding

        dw /= 2  # divide padding into 2 sides
        dh /= 2

        if shape[::-1] != new_unpad:  # resize
            im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
        top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
        left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
        im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
        return im, r, (dw, dh)

    def xywh2xyxy(self, x):
        # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
        y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
        y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
        y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
        y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
        y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
        return y

    def box_area(self, box):
        # box = xyxy(4,n)
        return (box[2] - box[0]) * (box[3] - box[1])
    def box_iou(self, box1, box2, eps=1e-7):
        # inter(N,M) = (rb(N,M,2) - lt(N,M,2)).clamp(0).prod(2)
        (a1, a2), (b1, b2) = box1[:, None].chunk(2, 2), box2.chunk(2, 1)
        inter = (torch.min(a2, b2) - torch.max(a1, b1)).clamp(0).prod(2)

        # IoU = inter / (area1 + area2 - inter)
        return inter / (self.box_area(box1.T)[:, None] + self.box_area(box2.T) - inter + eps)

    def scale_coords(self, img1_shape, coords, img0_shape, ratio_pad=None):
        # Rescale coords (xyxy) from img1_shape to img0_shape
        if ratio_pad is None:  # calculate from img0_shape
            gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new
            pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2  # wh padding
        else:
            gain = ratio_pad[0][0]
            pad = ratio_pad[1]

        coords[:, [0, 2]] -= pad[0]  # x padding
        coords[:, [1, 3]] -= pad[1]  # y padding
        coords[:, :4] /= gain
        self.clip_coords(coords, img0_shape)
        return coords

    def clip_coords(self, boxes, shape):
        # Clip bounding xyxy bounding boxes to image shape (height, width)
        if isinstance(boxes, torch.Tensor):  # faster individually
            boxes[:, 0].clamp_(0, shape[1])  # x1
            boxes[:, 1].clamp_(0, shape[0])  # y1
            boxes[:, 2].clamp_(0, shape[1])  # x2
            boxes[:, 3].clamp_(0, shape[0])  # y2
        else:  # np.array (faster grouped)
            boxes[:, [0, 2]] = boxes[:, [0, 2]].clip(0, shape[1])  # x1, x2
            boxes[:, [1, 3]] = boxes[:, [1, 3]].clip(0, shape[0])  # y1, y2

    def non_max_suppression(self, prediction,
                            conf_thres=0.25,
                            iou_thres=0.45,
                            agnostic=False,
                            max_det=300):
        bs = prediction.shape[0]  # batch size
        xc = prediction[..., 4] > conf_thres  # candidates
        # Settings
        # min_wh = 2  # (pixels) minimum box width and height
        max_wh = 7680  # (pixels) maximum box width and height
        max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()
        redundant = True  # require redundant detections
        merge = False  # use merge-NMS
        output = [torch.zeros((0, 6), device=prediction.device)] * bs
        for xi, x in enumerate(prediction):  # image index, image inference
            # Apply constraints
            # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height
            x = x[xc[xi]]  # confidence
            # If none remain process next image
            if not x.shape[0]:
                continue

            # Compute conf
            x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf

            # Box (center x, center y, width, height) to (x1, y1, x2, y2)
            box = self.xywh2xyxy(x[:, :4])

            # Detections matrix nx6 (xyxy, conf, cls)
            conf, j = x[:, 5:].max(1, keepdim=True)
            x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]
            # Apply finite constraint
            # if not torch.isfinite(x).all():
            #     x = x[torch.isfinite(x).all(1)]

            # Check shape
            n = x.shape[0]  # number of boxes
            if not n:  # no boxes
                continue
            elif n > max_nms:  # excess boxes
                x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence

            # Batched NMS
            c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
            boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
            i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
            if i.shape[0] > max_det:  # limit detections
                i = i[:max_det]
            if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
                # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
                iou = self.box_iou(boxes[i], boxes) > iou_thres  # iou matrix
                weights = iou * scores[None]  # box weights
                x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
                if redundant:
                    i = i[iou.sum(1) > 1]  # require redundancy

            output[xi] = x[i]
        return output

    def _infer(self, inputs: np.ndarray):
        try:
            img = cv2.cvtColor(inputs, cv2.COLOR_BGR2RGB)
            image = img.copy()
            image, ratio, dwdh = self.letterbox(image, auto=False)
            image = image.transpose((2, 0, 1))
            image = np.expand_dims(image, 0)
            image = np.ascontiguousarray(image)

            im = image.astype(np.float16)
            im /= 255
            #im = im.astype(np.float16)
            # ONNX inference
            output = list()
            bboxs, confs, clss = [], [], []
            detections = self.session.run([self.session.get_outputs()[0].name], {self.session.get_inputs()[0].name: im})[0]
            detections = torch.from_numpy(detections).to(torch.device('cpu'))
            pred = self.non_max_suppression(detections, conf_thres=0.25, iou_thres=0.45, agnostic=False, max_det=1000)
            print('pred: ', pred)
            for i, det in enumerate(pred):
                det[:, :4] = self.scale_coords(im.shape[2:], det[:, :4], img.shape).round()
                for *xyxy, conf, cls in reversed(det):
                    c = int(cls)
                    #labels = self.names[c]
                    x1, y1, x2, y2 = [int(x.cpu().numpy()) for x in xyxy]
                    bboxs.append([x1, y1, x2, y2])
                    clss.append(self.names[c])
                    confs.append(conf.cpu().numpy())
            output = [bboxs, clss, confs]
            return output

        except Exception as e:
            print(e)

    def infer(self, image):
        image = np.array(image)
        image = image[:, :, ::-1].copy()
        h, w, _ = image.shape
        detections = self._infer(image)
        results = []

        if detections:
            boxes = detections[0]
            labels = detections[1]
            scores = detections[2]
            for label, score, box in zip(labels, scores, boxes):    
                results.append({
                    "confidence": str(score),
                    "label": label,
                    "points": box,
                    "type": "rectangle",
                })

        return results

Expected Behavior

I expect the code works with custom weight model same as original model since I just changed the class labels and the weight file name. the rest of the code is same. I cannot find out why the Nuclio cannot find the my weight file which is in the same location of the original weight file. Any help would be appreciate to understand why my code is working perfectly with original weight file and not custom weight file.

Thanks

Possible Solution

No response

Context

No response

Environment

No response

About this issue

Original URL
State: closed
Created 4 months ago
Comments: 15 (2 by maintainers)

Most upvoted comments

You told, that:

However when I try to deploy with my custom weight file, the CVAT or Nuclio cannot find the weight file.

So, if you 100% sure in this, it only means that the file is missing on the specified path. Maybe it is in another directory, have another name, but it was not found. I do not expect any magic here. Look at many other models in serverless directory. The same logic in all of them.

bsekachev on Feb 28, 2024

Hii @Auth0rM0rgan Does the issue persist?

I got it solved, for my case. Maybe something went wrong when deploying a new function through nuctl. To solve it, remove the directory that was created at /etc/nuclio/store/functions/nuclio in the nuclio-local-storage-reader container.

PrashantDixit0 on Feb 28, 2024

@PrashantDixit0 man pls read the issue first!!! I can already deploy Yolov5 with onnx model successfully but the issue is my own custom pretrained model which the CVAT cannot find and only work with the original onnx model.

Auth0rM0rgan on Feb 27, 2024

My Issue Resolved 👍

PrashantDixit0 on Feb 26, 2024

In my local deployment, I can’t see option of model in Navbar

PrashantDixit0 on Feb 25, 2024