CUDA-PointPillars: PostProcessCuda is very slow using my model

System: Ubuntu 20.04 Last version of OpenPcDet GPU has cuda devices: 1 ----device id: 0 info---- GPU : NVIDIA GeForce RTX 2080 with Max-Q Design Capbility: 7.5 Global memory: 7982MB Const memory: 64KB SM in a block: 48KB warp size: 32 threads in a block: 1024 block dim: (1024,1024,64) grid dim: (2147483647,65535,65535)

Hello,

I exported my pointpillar weights trained on custom data. The only change compared to the example model in parameters is the fact that it only uses 1 class instead of 3. I had to change a few things in tools/simplifier_onnx.py for the exporter to work with other than 3 classes:

Code changes to work with 1 class I changed the signature of simplify_postprocess(onnx_model) to simplify_postprocess(onnx_model, num_classes) and changed 3 other lines.

-  cls_preds = gs.Variable(name="cls_preds", dtype=np.float32, shape=(1, 248, 216, 18))
-  box_preds = gs.Variable(name="box_preds", dtype=np.float32, shape=(1, 248, 216, 42))
-  dir_cls_preds = gs.Variable(name="dir_cls_preds", dtype=np.float32, shape=(1, 248, 216, 12))
+  cls_preds = gs.Variable(name="cls_preds", dtype=np.float32, shape=(1, 248, 216, 2 * num_classes * num_classes))
+  box_preds = gs.Variable(name="box_preds", dtype=np.float32, shape=(1, 248, 216, 14 * num_classes))
+  dir_cls_preds = gs.Variable(name="dir_cls_preds", dtype=np.float32, shape=(1, 248, 216, 4 * num_classes))

The exporter works but when testing the demo with this model: ---- RUN TIME ---- load file: …/data/data_velo/000001.bin find points num: 18630 find pillar_num: 6815 TIME: generateVoxels: 0.038048 ms. TIME: generateFeatures: 0.053024 ms. TIME: doinfer: 30.2525 ms. TIME: doPostprocessCuda: 57528.1 ms. TIME: pointpillar: 57558.6 ms. Bndbox objs: 4158 Saved prediction in: …/eval/kitti/object/pred_velo/000001.txt

This model works perfectly fine in pytorch.

As you can see the post process part takes a long time and outputs thousands of bounding boxes. Issue #43 references a similar problem seemingly solved by an update but I am currently using the most updated version of this repo.

Do you have an idea what could cause this issue?

I can upload my .pth file or my onnx file if you want to try and reproduce this.

Best regards,

About this issue

Most upvoted comments

@rjwb1 hello , thanks for your guidance very much. I changed the paramters just like u did , but the problem went from slow post-processing to cuda error: illegal memory access . I also try to use my own model which detect only one class,and also add the ROS, So I sincerely hope u can tell me how to solve the problem ,it brothers me a few days.