Vitis-AI: ERROR at pytorch quantize calibration step
Hello developer! I use ultralystic’s yolov3 demo (PyTorch v1.4 version) to train the model and successfully running the evaluation program at the latest Vitis-AI GPU docker:

But when I try to quantize the model, it shows the message:
[NNDCT_NOTE]: Quantization calibration process start up...
[NNDCT_NOTE]: =>Quant Module is in 'cuda'.
[NNDCT_NOTE]: =>Parsing Model...
aten_op 'meshgrid' parse failed(unsupported)
And then I found out where the issue just comes from:

@staticmethod
def _make_grid(nx=20, ny=20):
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
I exchange it with individual torch tensor operation functions:
@staticmethod
def _make_grid(nx=20, ny=20):
y=torch.arange(ny)
x=torch.arange(nx)
yv=[]
xv=[]
for cnt, item in enumerate(y):
if cnt==0:
yv=torch.full((1,nx),item)
xv=x.view(1,nx)
else:
yv=torch.cat((yv, torch.full((1,nx),item)), 0)
xv=torch.cat((xv, x.view(1,nx)), 0)
return torch.stack((xv, yv.long()), 2).view((1, 1, ny, nx, 2)).float()
But it shows the new error message:

I can’t even find where this guy ImplicitTensorToNum comes from in my project!
Anyway, I have no idea how to fix this problem. Looking for some help, thanks!
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 43 (1 by maintainers)
Commits related to this issue
- pool-fix op bug fix (#449) Co-authored-by: qianglin-xlnx <linqiang@xilinx.com> — committed to Xilinx/Vitis-AI by deleted user 3 years ago
- pool-fix op bug fix (#449) Co-authored-by: qianglin-xlnx <linqiang@xilinx.com> Former-commit-id: 4a24621ad51ff2391361668f427863f3487009b4 — committed to janifer112x/Vitis-AI by deleted user 3 years ago
意思是程序里传进去的测试图像数据集路径只需要指向目录地址即可> Hey @HSqure,感谢您的回覆,请问额外建一层目录是什么意思,我看了source code并无trace到。
named_buffers是在模型训练后生成的附加信息,你可以使用ultralytics的yolov3进行训练,用我的repo里的代码读取权重并量化。此外,导出的时候记得仅导出权重并且配置参数以适配一下pytorch旧版本,具体已经在repo下README.md中更新了。quantizer里的每个步骤(calib->test)前后都需要完整运行eval流程,包括从使用datasetloader喂数据到生成mAP结果,此外调用quantizer来进行finetune的时候也需要把eval函数的函数体和指针传进去供quantizer使用。最好保证每个eval都正常运行。你这个似乎是eval出了问题。此外注意必须使用
with no_gard():把反向传播禁掉。docker里使用vim修改库源码来debug(重启后会复位默认)。源码非常浅而且容易阅读。Hi @kct890721 , 跟这个issue相关的话直接在这下面问就OK,我尽我所能看看。
@Fabioni Hey, I just update version 1.4 and found that they fix everything and I even didn’t do anything.
let me know about your findings 😃
I just found that tutorial by xilinx https://github.com/Xilinx/Vitis-Tutorials/tree/master/Machine_Learning/Design_Tutorials/07-yolov4-tutorial
I think the same make_grid problem should be present in yolov4 (since I actually have it with yolov4-scaled), so this should somehow be covered in that tutorial but I couldn’t find something about it 😬
Hey @HSqure, I have the same problem. I can’t give you the solution but can explain the problem 😄 In the docu it’s written that the model has to pass the torch.jit.trace test and that test does fail in the make_grid function. The problem is that this function is not traceable.
I am also interested in how xilinx did it in the provided yolo models.
My idea is the following: if you investigate where the make_grid function is used, it is only used in the three layers called just “yolo”. (have a look at https://netron.app/?url=https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3.cfg).
So actually this is only post-processing and not really part of the network I think and we could just remove it from the quantization and put it back afterwards.
But I could not try that idea till now.