imvoxelnet: Runtime Error

I used the command as follow: srun -p Ai4sci_3D --gres gpu:1 bash tools/dist_train.sh configs/imvoxelnet/imvoxelnet_kitti.py 1

But I meet some problem:

(imvoxel) [nanuey@SH-IDC1-10-140-24-32 imvoxelnet]$ srun -p Ai4sci_3D --gres gpu:1 bash tools/dist_train.sh configs/imvoxelnet/imvoxelnet_kitti.py 1 srun: job 2563067 queued and waiting for resources srun: job 2563067 has been allocated resources srun: Job 2563067 scheduled successfully! Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition. Current PHX_PRIORITY is normal

Traceback (most recent call last): File “tools/train.py”, line 15, in <module> from mmdet3d.datasets import build_dataset File “/mnt/petrelfs/dengken_nerf_perception/imvoxelnet/mmdet3d/datasets/init.py”, line 1, in <module> from mmdet.datasets.builder import build_dataloader File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmdet/datasets/init.py”, line 2, in <module> from .cityscapes import CityscapesDataset File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmdet/datasets/cityscapes.py”, line 16, in <module> from .coco import CocoDataset File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmdet/datasets/coco.py”, line 14, in <module> from mmdet.core import eval_recalls File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmdet/core/init.py”, line 2, in <module> from .bbox import * # noqa: F401, F403 File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmdet/core/bbox/init.py”, line 7, in <module> from .samplers import (BaseSampler, CombinedSampler, File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmdet/core/bbox/samplers/init.py”, line 9, in <module> from .score_hlr_sampler import ScoreHLRSampler File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmdet/core/bbox/samplers/score_hlr_sampler.py”, line 2, in <module> from mmcv.ops import nms_match File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmcv/ops/init.py”, line 1, in <module> from .bbox import bbox_overlaps File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmcv/ops/bbox.py”, line 3, in <module> ext_module = ext_loader.load_ext(‘_ext’, [‘bbox_overlaps’]) File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmcv/utils/ext_loader.py”, line 11, in load_ext ext = importlib.import_module(‘mmcv.’ + name) File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/importlib/init.py”, line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) ImportError: /mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor6deviceEv /mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

FutureWarning, ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 8099) of binary: /mnt/est/nanuey/anaconda3/envs/imvoxel/bin/python3 Traceback (most recent call last): File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/runpy.py”, line 193, in _run_module_as_main “main”, mod_spec) File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/runpy.py”, line 85, in _run_code exec(code, run_globals) File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/torch/distributed/launch.py”, line 193, in <module> main() File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/torch/distributed/launch.py”, line 189, in main launch(args) File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/torch/distributed/launch.py”, line 174, in launch run(args) File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/torch/distributed/run.py”, line 713, in run )(*cmd_args) File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/torch/distributed/launcher/api.py”, line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File “/mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/torch/distributed/launcher/api.py”, line 261, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

tools/train.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2023-11-06_19:17:25 host : SH-IDC1-10-140-24-88 rank : 0 (local_rank: 0) exitcode : 1 (pid: 8099) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

I have installed as your instruction, only changed the version of pytorch, I use pytorch 1.10.0, cuda 11.3, mmdet3d 0.8.0, mmdet 2.10, mmcv-full 1.2.7.

"pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Install MMCV

pip install mmcv-full==1.2.7+torch1.6.0+cu101 -f https://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html pip install mmdet==2.10.0

Install MMDetection

git clone https://github.com/saic-vul/imvoxelnet.git cd imvoxelnet pip install -r requirements/build.txt pip install --no-cache-dir -e .

Uninstall pycocotools installed by nuscenes-devkit and reinstall mmpycocotools

pip uninstall pycocotools --no-cache-dir -y pip install mmpycocotools==12.0.3 --no-cache-dir --force --no-deps

Install differentiable IoU

cd … git clone https://github.com/lilanxiao/Rotated_IoU cp -r Rotated_IoU/cuda_op imvoxelnet/mmdet3d/ops/rotated_iou cd imvoxelnet/mmdet3d/ops/rotated_iou/cuda_op python setup.py install"

But I still meet the problem I show: ImportError: /mnt/est/nanuey/anaconda3/envs/imvoxel/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor6deviceEv.

Could you please help me

Best wishes~

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 19 (5 by maintainers)

Most upvoted comments

I success by using torch 1.8.1, cuda 10.1, mmcv1.3 mmdet 2.11. Thanks a lot