Vitis-AI: Error on compiling a yolov4

Hi, I’m trying to compile a yolov4 model trained on a custom dataset. And I’m running into the following error:

**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************
[INFO] Namespace(inputs_shape=None, layout='NCHW', model_files=['yolov4_quantized/deploy.caffemodel'], model_type='caffe', out_filename='./compiled/yolov4_org.xmodel', proto='yolov4_quantized/deploy.prototxt')
[INFO] caffe model: yolov4_quantized/deploy.caffemodel
[INFO] caffe model: yolov4_quantized/deploy.prototxt
[INFO] parse raw model     :100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 382/382 [00:16<00:00, 22.53it/s]                
[INFO] infer shape (NCHW)  :100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 382/382 [00:00<00:00, 9200.99it/s]              
[INFO] infer shape (NHWC)  :100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 382/382 [00:00<00:00, 5438.73it/s]              
[INFO] generate xmodel     :100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 382/382 [00:00<00:00, 845.43it/s]               
[INFO] generate xmodel: /workspace/yolov4_atlas/compiled/yolov4_org.xmodel
[UNILOG][INFO] The compiler log will be dumped at "/tmp/vitis-ai-user/log/xcompiler-20210323-071025-7300"
[UNILOG][INFO] Compile mode: dpu
[UNILOG][INFO] Debug mode: function
[UNILOG][INFO] Target architecture: DPUCZDX8G_CUSTOMIZED
[UNILOG][INFO] Graph name: deploy, with op num: 822
[UNILOG][INFO] Begin to compile...
module_infer = 0, counter_m[module_infer] = 10, counter_p[module_infer] = 10
module_idx = 0, counter_m[module_idx] = 10, counter_p[module_idx] = 10
module_idx = 1, counter_m[module_idx] = 0, counter_p[module_idx] = 0
module_idx = 2, counter_m[module_idx] = 8, counter_p[module_idx] = 8
module_idx = 3, counter_m[module_idx] = 0, counter_p[module_idx] = 0
[UNILOG][FATAL][XCOM_PM_FAIL][The compiler occurs an error when generating instructions, please contact us.] 
*** Check failure stack trace: ***
Aborted (core dumped)

To give you more details about what I’ve done. I somehow follow this tutorial (part 3), replacing the VOC dataset by my own, and then train a model using Darknet .

I then converted the model to caffe using the following command: python /opt/vitis_ai/conda/envs/vitis-ai-caffe/bin/convert.py yolov4.cfg yolov4_last.weights yolov4.prototxt yolov4.caffemodel

I changed the prototxt as specified in the tutorial and I run quantization as follow: vai_q_caffe quantize -model yolov4.prototxt -calib_iter 100 -weights yolov4.caffemodel -sigmoided_layers layer135-conv,layer146-conv,layer157-conv -output_dir yolov4_quantized/ -keep_fixed_neuron

For compilation, I add to remove a duplicated bottom field in one of the layer of the deploy.prototxt because of this error:

AssertionError: [ERROR] Invalid prototxt file: duplicate names found in the bottom field of layer (name: layer110-concat) in prototxt file: ['layer105-conv', 'layer107-conv', 'layer105-conv', 'layer104-conv']

Finally I ran compilation: vai_c_caffe --prototxt yolov4_quantized/deploy.prototxt --caffemodel yolov4_quantized/deploy.caffemodel --arch /opt/vitis_ai/compiler/arch/DPUCZDX8G/ULTRA96/arch.json --output_dir ./compiled --net_name yolov4 --options "{'save_kernel':''}" And got the error above.

FYI, I did try to compile for a ZCU102 and got the same error.

I did an additional test: I quantize without using the option -keep_fixed_neuron . Then the compilation was not erroring out, but I got zero DPU subgraph as shown below:

**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************
[INFO] Namespace(inputs_shape=None, layout='NCHW', model_files=['yolov4_quantized/deploy.caffemodel'], model_type='caffe', out_filename='./compiled/yolov4_org.xmodel', proto='yolov4_quantized/deploy.prototxt')
[INFO] caffe model: yolov4_quantized/deploy.caffemodel
[INFO] caffe model: yolov4_quantized/deploy.prototxt
[INFO] parse raw model     :100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 253/253 [00:16<00:00, 15.02it/s]                
[INFO] infer shape (NCHW)  :100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 253/253 [00:00<00:00, 4397.95it/s]              
[INFO] infer shape (NHWC)  :100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 253/253 [00:00<00:00, 3426.66it/s]              
[INFO] generate xmodel     :100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 253/253 [00:00<00:00, 574.03it/s]               
[INFO] generate xmodel: /workspace/yolov4_atlas/compiled/yolov4_org.xmodel
[UNILOG][INFO] The compiler log will be dumped at "/tmp/vitis-ai-user/log/xcompiler-20210323-072413-7395"
[UNILOG][INFO] Target architecture: DPUCZDX8G_ISA0_B4096_MAX_BG2
[UNILOG][INFO] Compile mode: dpu
[UNILOG][INFO] Debug mode: function
[UNILOG][INFO] Target architecture: DPUCZDX8G_ISA0_B4096_MAX_BG2
[UNILOG][INFO] Graph name: deploy, with op num: 693
[UNILOG][INFO] Begin to compile...
[UNILOG][INFO] Total device subgraph number 2, DPU subgraph number 0
[UNILOG][INFO] Compile done.
[UNILOG][INFO] The meta json is saved to "/workspace/yolov4_atlas/./compiled/meta.json"
[UNILOG][INFO] The compiled xmodel is saved to "/workspace/yolov4_atlas/./compiled/yolov4.xmodel"
[UNILOG][INFO] The compiled xmodel's md5sum is 65617d0a8a96e1d0d4bffcae6fc4a574, and been saved to "/workspace/yolov4_atlas/./compiled/md5sum.txt"

Please find attached the following files (archived in a zip):

darknet.cfg: darknet config file use to train the yolov4 model on my own dataset
yolov4_before_quantization.prototxt: Caffe prototxt after conversion from darknet to caffe
deploy_with_keep_fixed_neurons.protoxt: Caffe prototxt after quantization using the keep_fixed_neurons options
deploy_without_keep_fixed_neurons.protoxt: Caffe prototxt after quantization not using the keep_fixed_neurons options config_files.zip

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 25 (2 by maintainers)

Commits related to this issue

Update README.md (#346) * Update README.md * delete hourglass-pytorch Co-authored-by: wangxd <wangxd@xilinx.com> — committed to Xilinx/Vitis-AI by deleted user 3 years ago

Most upvoted comments

Hi @romaintha

Thank you for your quick reply.

There were several issues within this “issue” The first compilation error I got was because I remove the SPP block from the darknet config. You might have done the same.

No, I followed the latest tutorial to modify darknet config (that is same as https://github.com/Xilinx/Vitis-AI/issues/346#issuecomment-805730810) I noticed that maybe the issue related to docker version. I will do retry with another docker version.

quyetvk on Apr 14, 2021

Hi @romaintha, I just downloaded the Ultra96v2 board image and recompiled the model to target this architecture. I was able to reproduce the hang issue and I think the cause was that I initially used the v1.3.1 GPU docker to compile the model.

In the instructions provided by Mario Bergeron below, he indicates to use a specific version of the docker corresponding to the v1.3.0 release.

https://www.hackster.io/AlbertaBeef/vitis-ai-1-3-flow-for-avnet-vitis-platforms-cd0c51

Specifically see the following steps:

6.1 If not done so already, pull version 1.3.411 of the docker container with the following command:

docker pull xilinx/vitis-ai:1.3.411 6.2 Launch version 1.3.411 of the Vitis-AI docker from the Vitis-AI directory:

$ cd $VITIS_AI_HOME $ sh -x docker_run.sh xilinx/vitis-ai:1.3.411

The model compilation process in this case will produce a file that is around 178MB, whereas the 1.3.1 docker will produce a model that is 67MB. I have now verified that the 1.3.1 compiled model hangs on the target whereas the 1.3.0 model (using the xilinx/vitis-ai:1.3.411 docker for compilation) runs correctly.

Thanks and hope this helps!

ghost on Mar 26, 2021