vision: ops.deform_conv2d causes CUDA illegal memory access
π Bug
I try to test the speed of deformable conv2d. But always encountered memory error.
To Reproduce
$ ipython
Python 3.8.5 (default, Jul 27 2020, 08:42:51)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.17.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import torch
...: import torchvision as tv
...: weight = torch.randn(9,9,3,3).cuda()
...: weight.requires_grad = True
...: img = torch.randn(8, 9, 1000, 110).cuda()
...: def test():
...: offset = torch.randn(8,18,1000,110).cuda()
...: out = tv.ops.deform_conv2d(img, offset, weight, padding=1)
...: out.mean().backward()
...:
In [2]: import os
In [3]: os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
In [4]: timeit test()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-4-a1086d7a4706> in <module>
----> 1 get_ipython().run_line_magic('timeit', 'test()')
/usr/lib/python3.8/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
2324 kwargs['local_ns'] = self.get_local_scope(stack_depth)
2325 with self.builtin_trap:
-> 2326 result = fn(*args, **kwargs)
2327 return result
2328
<decorator-gen-60> in timeit(self, line, cell, local_ns)
/usr/lib/python3.8/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
/usr/lib/python3.8/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
1171 break
1172
-> 1173 all_runs = timer.repeat(repeat, number)
1174 best = min(all_runs) / number
1175 worst = max(all_runs) / number
/usr/lib/python3.8/timeit.py in repeat(self, repeat, number)
203 r = []
204 for i in range(repeat):
--> 205 t = self.timeit(number)
206 r.append(t)
207 return r
/usr/lib/python3.8/site-packages/IPython/core/magics/execution.py in timeit(self, number)
167 gc.disable()
168 try:
--> 169 timing = self.inner(it, self.timer)
170 finally:
171 if gcold:
<magic-timeit> in inner(_it, _timer)
<ipython-input-1-a97200bb984a> in test()
5 img = torch.randn(8, 9, 1000, 110).cuda()
6 def test():
----> 7 offset = torch.randn(8,18,1000,110).cuda()
8 out = tv.ops.deform_conv2d(img, offset, weight, padding=1)
9 out.mean().backward()
RuntimeError: CUDA error: an illegal memory access was encountered
In [5]:
Environment
PyTorch version: 1.6.0 Is debug build: False CUDA used to build PyTorch: 11.0
OS: Arch Linux (x86_64) GCC version: (GCC) 10.1.0 Clang version: 10.0.1 CMake version: version 3.18.1
Python version: 3.8 (64-bit runtime) Is CUDA available: True CUDA runtime version: 11.0.2 GPU models and configuration: GPU 0: GeForce GTX 1050 Ti Nvidia driver version: 450.57 cuDNN version: Probably one of the following: /usr/lib/libcudnn.so.8.0.2 /usr/lib/libcudnn_adv_infer.so.8.0.2 /usr/lib/libcudnn_adv_train.so.8.0.2 /usr/lib/libcudnn_cnn_infer.so.8.0.2 /usr/lib/libcudnn_cnn_train.so.8.0.2 /usr/lib/libcudnn_ops_infer.so.8.0.2 /usr/lib/libcudnn_ops_train.so.8.0.2
Versions of relevant libraries: [pip3] numpy==1.19.1 [pip3] torch==1.6.0 [pip3] torch-cluster==1.4.5 [pip3] torch-geometric==1.3.2 [pip3] torch-scatter==1.4.0 [pip3] torch-sparse==0.4.3 [pip3] torchvision==0.7.0a0 [conda] Could not collect
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 40 (6 by maintainers)
Commits related to this issue
- [WIP] Fixes #2598 - Adjusted num_kernels and batch_size according to kMaxGridNum * CUDA_NUM_THREADS - Tests to add — committed to Quansight/vision by vfdev-5 4 years ago
- Fixes crash in deformable convolutions (2598) (#2604) * [WIP] Fixes #2598 - Adjusted num_kernels and batch_size according to kMaxGridNum * CUDA_NUM_THREADS - Tests to add * - Redefined kMaxGridN... — committed to pytorch/vision by vfdev-5 4 years ago
- Fixes crash in deformable convolutions (2598) (#2604) * [WIP] Fixes #2598 - Adjusted num_kernels and batch_size according to kMaxGridNum * CUDA_NUM_THREADS - Tests to add * - Redefined kMaxGridN... — committed to bryant1410/vision-1 by vfdev-5 4 years ago
The following script produces an error but not the exact error from above (although it appears related)
torchvision_bug.py
To reproduce run:
CUDA_LAUNCH_BLOCKING=1 python torchvision_bug.py
error message this time is:
Note the variable
bs
. On my test machine I can reproduce atbs=23
or higher, i.e.,bs=22
works fine,bs=23
reproduces the error.I am using a Tesla V100 for testing, this may be related to how much memory the card has, which is why Iβm printing the max allocated memory as reported by pytorch. Note that in the working case (22) itβs only ~38% on my card so there should be plenty of headroom, although I dont know for sure how accurate that is.
pytorch version: 1.9.0 torchvision version: 0.10.0 cuda version (as reported by nvidia-smi let me know if thats not right): 11.2
Why is kMaxGridDim set to 65K? Maximum gridDim.x is 2**31, it can also be queried from device properties rather than hardcoded.