OpenPCDet: Problems using custom data sets
I am trying to use my own Lidar data to test PV-RCNN instead of kitti data, I used similar kaggle annotations However, I get an error when trying to run the code and the error message is as follows
File "***/OpenPCDet/pcdet/datasets/innovusion/innovusion_dataset.py", line 77, in __getitem__
data_dict = self.prepare_data(data_dict=input_dict)
File "***/OpenPCDet/pcdet/datasets/dataset.py", line 124, in prepare_data
'gt_boxes_mask': gt_boxes_mask
File "***/OpenPCDet/pcdet/datasets/augmentor/data_augmentor.py", line 93, in forward
data_dict = cur_augmentor(data_dict=data_dict)
File "***/OpenPCDet/pcdet/datasets/augmentor/database_sampler.py", line 179, in __call__
sampled_boxes = np.stack([x['box3d_lidar'] for x in sampled_dict], axis=0).astype(np.float32)
File "<__array_function__ internals>", line 6, in stack
File "***/anaconda3/envs/ml/lib/python3.7/site-packages/numpy/core/shape_base.py", line 423, in stack
raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack
I located the code and found that it was related to data enhancement, in pcdet/datasets/augmentor/database_sampler.py
def __call__(self, data_dict):
"""
Args:
data_dict:
gt_boxes: (N, 7 + C) [x, y, z, dx, dy, dz, heading, ...]
Returns:
"""
gt_boxes = data_dict['gt_boxes']
gt_names = data_dict['gt_names'].astype(str)
existed_boxes = gt_boxes
total_valid_sampled_dict = []
for class_name, sample_group in self.sample_groups.items():
if self.limit_whole_scene:
num_gt = np.sum(class_name == gt_names)
sample_group['sample_num'] = str(int(self.sample_class_num[class_name]) - num_gt)
if int(sample_group['sample_num']) > 0:
sampled_dict = self.sample_with_fixed_number(class_name, sample_group) ### need help
sampled_boxes = np.stack([x['box3d_lidar'] for x in sampled_dict], axis=0).astype(np.float32)
if self.sampler_cfg.get('DATABASE_WITH_FAKELIDAR', False):
sampled_boxes = box_utils.boxes3d_kitti_fakelidar_to_lidar(sampled_boxes)
iou1 = iou3d_nms_utils.boxes_bev_iou_cpu(sampled_boxes[:, 0:7], existed_boxes[:, 0:7])
iou2 = iou3d_nms_utils.boxes_bev_iou_cpu(sampled_boxes[:, 0:7], sampled_boxes[:, 0:7])
iou2[range(sampled_boxes.shape[0]), range(sampled_boxes.shape[0])] = 0
iou1 = iou1 if iou1.shape[1] > 0 else iou2
valid_mask = ((iou1.max(axis=1) + iou2.max(axis=1)) == 0).nonzero()[0]
valid_sampled_dict = [sampled_dict[x] for x in valid_mask]
valid_sampled_boxes = sampled_boxes[valid_mask]
existed_boxes = np.concatenate((existed_boxes, valid_sampled_boxes), axis=0)
total_valid_sampled_dict.extend(valid_sampled_dict)
sampled_gt_boxes = existed_boxes[gt_boxes.shape[0]:, :]
if total_valid_sampled_dict.__len__() > 0:
data_dict = self.add_sampled_boxes_to_scene(data_dict, sampled_gt_boxes, total_valid_sampled_dict)
data_dict.pop('gt_boxes_mask')
return data_dict
Then the key function is sample_with_fixed_number(self, class_name, sample_group)
def sample_with_fixed_number(self, class_name, sample_group):
"""
Args:
class_name:
sample_group:
Returns:
"""
sample_num, pointer, indices = int(sample_group['sample_num']), sample_group['pointer'], sample_group['indices']
if pointer >= len(self.db_infos[class_name]):
indices = np.random.permutation(len(self.db_infos[class_name]))
pointer = 0
sampled_dict = [self.db_infos[class_name][idx] for idx in indices[pointer: pointer + sample_num]]
pointer += sample_num
sample_group['pointer'] = pointer
sample_group['indices'] = indices
return sampled_dict
Self.db_infos is used in the code, it is specified by sampler_cfg.DB_INFO_PATH, but my data dose not have it, so I am stuck here, what do I need to do to fix it, or is there a detailed explanation for me to understand this code Note: My data annotation format
id confidence center_x center_y center_z width length height yaw class_name
thank you all
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 43 (3 by maintainers)
@Gltina Please make sure:
POINT_CLOUD_RANGE defines the range where space should be voxelized or in other words, space which contain points you assume to be relevant and you have annotations for. For KITTI this space is around 40m to the sides, 70m to the front, 3m below and 1m above the sensor, hence
[0, -39.68, -3, 69.12, 39.68, 1].VOXEL_SIZE defines the
[length, width, height]of each voxel, since PointPillars uses pillars instead of voxels, the height of a voxel is set to the full height of your point cloud range. For the KITTI frames the default length and width of a voxel is set to 16cm, hence[0.16, 0.16, 4].I hope this helps.
Hi,
Honestly, I think the author of this repo should be the best person to answer this question. What I did, you know, just is verify what kind of information that is not required in training process and what is not, for now, we can make sure that if you want to train your own dataset instead of the standard one, there are some things you need to do and folders you should structure as below:
first of all, the file structure should like this:
Along with the file structure, the coordinate of point could that you captured should keep the same parameters as KITTI, which is an important step for training data, click here to know more details.
On the other hand, the label file should keep the 3D information as following:
As it clearly shows, “1 2 3 4” is the 2D bounding box but we ignore it using meaningless data, and the following data like " 0.57 0.33 0.99 -0.52 1.73 6.45" are 3D dimension and 3D position respectively, the final one is also important because it defines the orientation of a 3D box in the scene.
That’s what we did about training, I don’t think that is a perfect and complete way to make a pre-training preparation if you have any methods or different way to train custom data with your own label, welcome to leave your comment to here (let more people help you)
@sshaoshuai @jihanyang Now it seems that you have not officially given a pipeline on how to use custom own data.
So I wrote a pipeline on how to import custom own data. Currently only the Dataloader stage. https://github.com/Leozyc-waseda/Lidar_Openpcdet_ST3D#use-openpcdet-to-train-your-own-dataset
#771 Hi guys. Hope this helps everyone.

Hi @jihanyang ,
Thanks for your help, I will use this value to train later. And here are some questions I was wondering:
Will
(0, 0, 50, 50)as fake data affect the final training result?How to change voxel_size properly?
If changing the voxel size in
kitti_dataset.yamldirectly as below:Following kitti_infos generation, an error occurs when running train.py:
details on error
RuntimeError: size mismatch, m1: [2048 x 3456], m2: [640 x 128] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:268In short, this question could be simplified as How to modify net parameters when changing the voxel_size?
Thanks in advance!
Hello @jihanyang, can you please explain why the point cloud range has to be adjusted with the values, 40 and multiple of 16? what’s the reason for these numbers?
Hello! @jihanyang @MartinHahner I am calling getitem repeatedly when training a custom dataset. What’s wrong with this? File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 155, in prepare_data return self.getitem(new_index) File “/nas/lyp/code/tools/…/pcdet/datasets/kitti/kitti_dataset.py”, line 441, in getitem data_dict = self.prepare_data(data_dict=input_dict) ################################# File “/nas/lyp/code/tools/…/pcdet/datasets/dataset.py”, line 127, in prepare_data data_dict = self.data_augmentor.forward( File “/nas/lyp/code/tools/…/pcdet/datasets/augmentor/data_augmentor.py”, line 240, in forward data_dict = cur_augmentor(data_dict=data_dict) File “/nas/lyp/code/tools/…/pcdet/datasets/augmentor/data_augmentor.py”, line 49, in random_world_flip gt_boxes, points = getattr(augmentor_utils, ‘random_flip_along_%s’ % cur_axis)( File “/nas/lyp/code/tools/…/pcdet/datasets/augmentor/augmentor_utils.py”, line 15, in random_flip_along_x enable = np.random.choice([False, True], replace=False, p=[0.5, 0.5]) File “mtrand.pyx”, line 978, in numpy.random.mtrand.RandomState.choice File “<array_function internals>”, line 5, in unique File “/root/anaconda3/envs/openpcdet/lib/python3.8/site-packages/numpy/lib/arraysetops.py”, line 262, in unique ret = _unique1d(ar, return_index, return_inverse, return_counts) File “/root/anaconda3/envs/openpcdet/lib/python3.8/site-packages/numpy/lib/arraysetops.py”, line 315, in _unique1d ar = np.asanyarray(ar).flatten() File “/root/anaconda3/envs/openpcdet/lib/python3.8/site-packages/numpy/core/_asarray.py”, line 171, in asanyarray return array(a, dtype, copy=False, order=order, subok=True) RecursionError: maximum recursion depth exceeded while calling a Python object
@thptai Yes, just follow #jihanyang said to adjust your POINT_CLOUD_RANGE. It works!
@Gltina Hello, you can set the 2D boxes (x1, y1, x2, y2) to (0, 0, 50, 50) since kitti will filter the boxes height smaller than 20px.
@Gltina @xixioba About the 2d information of custom data sets, we are thinking about releasing an example tha use kitti metric on other datasets (such as nuscenes) for evalutation.