CUDA-PointPillars: PostProcessCuda is very slow using my model
System: Ubuntu 20.04 Last version of OpenPcDet GPU has cuda devices: 1 ----device id: 0 info---- GPU : NVIDIA GeForce RTX 2080 with Max-Q Design Capbility: 7.5 Global memory: 7982MB Const memory: 64KB SM in a block: 48KB warp size: 32 threads in a block: 1024 block dim: (1024,1024,64) grid dim: (2147483647,65535,65535)
Hello,
I exported my pointpillar weights trained on custom data. The only change compared to the example model in parameters is the fact that it only uses 1 class instead of 3. I had to change a few things in tools/simplifier_onnx.py for the exporter to work with other than 3 classes:
Code changes to work with 1 class
I changed the signature of simplify_postprocess(onnx_model)
to simplify_postprocess(onnx_model, num_classes)
and changed 3 other lines.
- cls_preds = gs.Variable(name="cls_preds", dtype=np.float32, shape=(1, 248, 216, 18))
- box_preds = gs.Variable(name="box_preds", dtype=np.float32, shape=(1, 248, 216, 42))
- dir_cls_preds = gs.Variable(name="dir_cls_preds", dtype=np.float32, shape=(1, 248, 216, 12))
+ cls_preds = gs.Variable(name="cls_preds", dtype=np.float32, shape=(1, 248, 216, 2 * num_classes * num_classes))
+ box_preds = gs.Variable(name="box_preds", dtype=np.float32, shape=(1, 248, 216, 14 * num_classes))
+ dir_cls_preds = gs.Variable(name="dir_cls_preds", dtype=np.float32, shape=(1, 248, 216, 4 * num_classes))
The exporter works but when testing the demo with this model: ---- RUN TIME ---- load file: …/data/data_velo/000001.bin find points num: 18630 find pillar_num: 6815 TIME: generateVoxels: 0.038048 ms. TIME: generateFeatures: 0.053024 ms. TIME: doinfer: 30.2525 ms. TIME: doPostprocessCuda: 57528.1 ms. TIME: pointpillar: 57558.6 ms. Bndbox objs: 4158 Saved prediction in: …/eval/kitti/object/pred_velo/000001.txt
This model works perfectly fine in pytorch.
As you can see the post process part takes a long time and outputs thousands of bounding boxes. Issue #43 references a similar problem seemingly solved by an update but I am currently using the most updated version of this repo.
Do you have an idea what could cause this issue?
I can upload my .pth file or my onnx file if you want to try and reproduce this.
Best regards,
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 28
@rjwb1 hello , thanks for your guidance very much. I changed the paramters just like u did , but the problem went from slow post-processing to cuda error: illegal memory access . I also try to use my own model which detect only one class,and also add the ROS, So I sincerely hope u can tell me how to solve the problem ,it brothers me a few days.