Vitis-AI: ERROR at pytorch quantize calibration step

Hello developer! I use ultralystic’s yolov3 demo (PyTorch v1.4 version) to train the model and successfully running the evaluation program at the latest Vitis-AI GPU docker: Screenshot from 2021-06-15 17-20-26

But when I try to quantize the model, it shows the message:

[NNDCT_NOTE]: Quantization calibration process start up...

[NNDCT_NOTE]: =>Quant Module is in 'cuda'.

[NNDCT_NOTE]: =>Parsing Model...
aten_op 'meshgrid' parse failed(unsupported)

And then I found out where the issue just comes from:

Screenshot from 2021-06-15 17-27-18-

    @staticmethod
    def _make_grid(nx=20, ny=20):
        yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
        return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()

I exchange it with individual torch tensor operation functions:

    @staticmethod
    def _make_grid(nx=20, ny=20):

        y=torch.arange(ny)
        x=torch.arange(nx)
        yv=[]
        xv=[]

        for cnt, item in enumerate(y):
            if cnt==0:
                yv=torch.full((1,nx),item)
                xv=x.view(1,nx)
            else:
                yv=torch.cat((yv, torch.full((1,nx),item)), 0)
                xv=torch.cat((xv, x.view(1,nx)), 0)

        return torch.stack((xv, yv.long()), 2).view((1, 1, ny, nx, 2)).float()

But it shows the new error message: Screenshot from 2021-06-15 17-08-03

I can’t even find where this guy ImplicitTensorToNum comes from in my project! Anyway, I have no idea how to fix this problem. Looking for some help, thanks!

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 43 (1 by maintainers)

Commits related to this issue

pool-fix op bug fix (#449) Co-authored-by: qianglin-xlnx <linqiang@xilinx.com> — committed to Xilinx/Vitis-AI by deleted user 3 years ago
pool-fix op bug fix (#449) Co-authored-by: qianglin-xlnx <linqiang@xilinx.com> Former-commit-id: 4a24621ad51ff2391361668f427863f3487009b4 — committed to janifer112x/Vitis-AI by deleted user 3 years ago

Most upvoted comments

意思是程序里传进去的测试图像数据集路径只需要指向目录地址即可> Hey @HSqure,感谢您的回覆，请问额外建一层目录是什么意思，我看了source code并无trace到。

HSqure on Nov 27, 2021

您好我现在完全使用您的source code去进行quantize并把一些weights相关的从v3换成v5，但遇到以下这些问题

named_buffers是在模型训练后生成的附加信息，你可以使用ultralytics的yolov3进行训练，用我的repo里的代码读取权重并量化。此外，导出的时候记得仅导出权重并且配置参数以适配一下pytorch旧版本，具体已经在repo下README.md中更新了。

HSqure on Nov 22, 2021

非常感谢您的帮助，我真的非常需要您的帮忙，因为我的毕业报告是quantize yolov5 model但我用了好久都没办法。我想请问你大概清楚要如何quantize yolov5吗？附件是我的档案以及遇到的问题 common .txt Model.txt quant_info.txt quantize.txt

quantizer里的每个步骤(calib->test)前后都需要完整运行eval流程，包括从使用datasetloader喂数据到生成mAP结果，此外调用quantizer来进行finetune的时候也需要把eval函数的函数体和指针传进去供quantizer使用。最好保证每个eval都正常运行。你这个似乎是eval出了问题。此外注意必须使用with no_gard():把反向传播禁掉。

调试小Tips： 可以在docker里使用vim修改库源码来debug（重启后会复位默认）。源码非常浅而且容易阅读。

HSqure on Nov 18, 2021

Hi @HSqure 我有一些问题想请教您方便加一下微信吗?

我的ID是H98798389 感谢

Hi @kct890721 , 跟这个issue相关的话直接在这下面问就OK，我尽我所能看看。

HSqure on Nov 14, 2021

@Fabioni Hey, I just update version 1.4 and found that they fix everything and I even didn’t do anything.

Screenshot from 2021-07-28 10-46-53

HSqure on Jul 28, 2021

reply @Fabioni : Thanks for your solution! I’m curious how Xilinx fix that problem too. Anyway, It’s a really cool idea. I’ll try it, thanks!

let me know about your findings 😃

I just found that tutorial by xilinx https://github.com/Xilinx/Vitis-Tutorials/tree/master/Machine_Learning/Design_Tutorials/07-yolov4-tutorial

I think the same make_grid problem should be present in yolov4 (since I actually have it with yolov4-scaled), so this should somehow be covered in that tutorial but I couldn’t find something about it 😬

Fabioni on Jun 20, 2021

Hey @HSqure, I have the same problem. I can’t give you the solution but can explain the problem 😄 In the docu it’s written that the model has to pass the torch.jit.trace test and that test does fail in the make_grid function. The problem is that this function is not traceable.

I am also interested in how xilinx did it in the provided yolo models.

My idea is the following: if you investigate where the make_grid function is used, it is only used in the three layers called just “yolo”. (have a look at https://netron.app/?url=https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3.cfg).

So actually this is only post-processing and not really part of the network I think and we could just remove it from the quantization and put it back afterwards.

But I could not try that idea till now.

Fabioni on Jun 20, 2021