detectron2: RuntimeError: CUDA error: invalid device function ROIAlign_forward_cuda
Attempting to forward inference the panoptic fpn model results in a CUDA error.
To Reproduce
Attempting to run a predictor
using the model panoptic_fpn_R_101_dconv_cascade_gn_3x.yaml
.
The following error is produced:
error in deformable_im2col: invalid device function
... < repeated ~30 times> ...
File "/opt/conda/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
return func(*args, **kwargs)
File "/app/detectron2/detectron2/engine/defaults.py", line 176, in __call__
predictions = self.model([inputs])[0]
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/app/detectron2/detectron2/modeling/meta_arch/panoptic_fpn.py", line 98, in forward
images, features, proposals, gt_instances
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/app/detectron2/detectron2/modeling/roi_heads/cascade_rcnn.py", line 97, in forward
pred_instances = self._forward_box(features_list, proposals)
File "/app/detectron2/detectron2/modeling/roi_heads/cascade_rcnn.py", line 112, in _forward_box
head_outputs.append(self._run_stage(features, proposals, k))
File "/app/detectron2/detectron2/modeling/roi_heads/cascade_rcnn.py", line 203, in _run_stage
box_features = self.box_pooler(features, [x.proposal_boxes for x in proposals])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/app/detectron2/detectron2/modeling/poolers.py", line 192, in forward
output[inds] = pooler(x_level, pooler_fmt_boxes_level)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/app/detectron2/detectron2/layers/roi_align.py", line 95, in forward
input, rois, self.output_size, self.spatial_scale, self.sampling_ratio, self.aligned
File "/app/detectron2/detectron2/layers/roi_align.py", line 20, in forward
input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned
RuntimeError: CUDA error: invalid device function (ROIAlign_forward_cuda at /app/detectron2/detectron2/layers/csrc/ROIAlign/ROIAlign_cuda.cu:359)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7ffa5c402687 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: ROIAlign_forward_cuda(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xa37 (0x7ffa0653b6f5 in /app/detectron2/detectron2/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xbc (0x7ffa064c9fdc in /app/detectron2/detectron2/_C.cpython-36m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x5961a (0x7ffa064db61a in /app/detectron2/detectron2/_C.cpython-36m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x5971e (0x7ffa064db71e in /app/detectron2/detectron2/_C.cpython-36m-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x53ca0 (0x7ffa064d5ca0 in /app/detectron2/detectron2/_C.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
frame #12: THPFunction_apply(_object*, _object*) + 0x9ff (0x7ffa5d63dacf in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Environment
--------------------- -------------------------------------------------------------------
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
Detectron2 Compiler GCC 5.4
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.3.0
PyTorch Debug Build False
CUDA available True
GPU 0,1 GeForce GTX 1080 Ti
Pillow 6.2.0
cv2 3.4.4
--------------------- -------------------------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 21 (9 by maintainers)
I was able to reproduce the same error when I use the wrong version of cuda.
What I did: I install pytorch from
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
, however my local cuda runtime and nvcc are in 10.0. In this case, I can observe the same error. Please check whether your cuda version is correct.It seems that mismatched NVCC vs CUDA Runtime version is the root cause. Closing but feel free to reopen if this does not solve your issue.
Yes, that’s the key to solve my problem
Problem
first, briefly introduce my problem: I’m new to Detectron2 and only one GPU(GeForce GTX 1080Ti). I choose to build Detectron2 from Source:
everything is fine and detectron2 is installed successfully
but when I try to train
Solve
I check the cuda version
before this I install cudatoolkit=10.2,but now i choose the earlier version
after rebuilt Detectron2,the problem solved!!!
Most likely the solution to your problem is already in https://detectron2.readthedocs.io/tutorials/install.html#common-installation-issues. If you need help to solve an unexpected issue you observed, please include details following the issue template.
No it’s unrelated to model zoo. It’s likely because cuda 9.2 is just what your computer is using.
as followup from #78 . I installed new env with CUDA 9.2 and this solved my issue. Could the problem be since as stated at https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md all models are trained with CUDA 9.2 ?