vision: torchvision.ops.batched_nms() crashes with pytorch 1.9.0 and torchvision 0.10.0

🐛 Bug

with the just released pytorch 1.9.0 and torchvision 0.10.0 torchvision.ops.batched_nms() crashes on my machine with the following error:

RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

Since both are of the current version, I guess they should be compatible (they are not yet listed in the compatibility matrix).

To Reproduce

Steps to reproduce the behavior:

this example code shows the behavior on my machine:

import torch as th
import torchvision as tv

boxes = th.zeros(1000, 4)
scores = th.zeros(1000)
idxs = th.zeros(1000)

tv.ops.batched_nms(boxes, scores, idxs, 0.5)

Expected behavior

This should not result in an error.

Environment

Collecting environment information… PyTorch version: 1.9.0 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.27

Python version: 3.9 (64-bit runtime) Python platform: Linux-4.15.0-144-generic-x86_64-with-glibc2.27 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB

Nvidia driver version: 460.32.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.20.2 [pip3] torch==1.9.0 [pip3] torchaudio==0.9.0a0+33b2469 [pip3] torchvision==0.10.0 [conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 h8f6ccaa_8 conda-forge [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.2.0 h726a3e6_389 conda-forge [conda] mkl-service 2.4.0 py39h3811e60_0 conda-forge [conda] mkl_fft 1.3.0 py39h42c9631_2
[conda] mkl_random 1.2.2 py39hde0f152_0 conda-forge [conda] numpy 1.20.2 py39h2d18471_0
[conda] numpy-base 1.20.2 py39hfae3a4d_0
[conda] pytorch 1.9.0 py3.9_cuda10.2_cudnn7.6.5_0 pytorch [conda] torchaudio 0.9.0 py39 pytorch [conda] torchvision 0.10.0 py39_cu102 pytorch

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 3
  • Comments: 19 (11 by maintainers)

Most upvoted comments

with the just released pytorch 1.9.0 and torchvision 0.10.0 torchvision.ops.batched_nms() crashes on my machine with the following error:

RuntimeError: Couldn’t load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.version and your torchvision version with torchvision.version and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

how to solve?please

I think difference can be explained by presence/absence of conda-forge in ones .condarc. I got the repro after removing conda-forge dependency, but than fixed it by enabling it in the install command as follows:

conda create -n test python=3.9 pytorch torchvision cudatoolkit=11.1 -c pytorch -c nvidia -c conda-forge

Since @fmassa pointed to https://anaconda.org/pytorch-test/torchvision/files, I just installed from there and the sample works.

@NicolasHug mmhh, this seems to be a different issue. When I run the line you posted on my machine, everything is fine and 0.10.0 is being installed

torchvision in https://anaconda.org/pytorch channel was build against https://github.com/pytorch/vision/commit/9d5561b1f1224426a34ac13391f9c62f03c75b2f whereas one in https://anaconda.org/pytorch was build against https://github.com/pytorch/vision/commit/ae9963fd077619c7d2a134813e35551943e87458 I guess promoting package from one channel to another should resolve the issue

Can also report a REGRESSION. A similar issue has occurred to me when running torch.jit.script Code that worked with pytorch 1.8.0 and torchvision 0.9.1 after update to pytorch 1.9.0 and torchvision 0.10.0 now fails with:

RuntimeError: 
object has no attribute nms:
  File "C:\tools\Anaconda3\lib\site-packages\torchvision\ops\boxes.py", line 35
    """
    _assert_has_ops()
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
           ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
'nms' is being compiled since it was called from '_batched_nms_vanilla'
  File "C:\tools\Anaconda3\lib\site-packages\torchvision\ops\boxes.py", line 102
    for class_id in torch.unique(idxs):
        curr_indices = torch.where(idxs == class_id)[0]
        curr_keep_indices = nms(boxes[curr_indices], scores[curr_indices], iou_threshold)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        keep_mask[curr_indices[curr_keep_indices]] = True
    keep_indices = torch.where(keep_mask)[0]
'_batched_nms_vanilla' is being compiled since it was called from 'batched_nms'
  File "C:\tools\Anaconda3\lib\site-packages\torchvision\ops\boxes.py", line 66
    # Ideally for GPU we'd use a higher threshold
    if boxes.numel() > 4_000 and not torchvision._is_tracing():
        return _batched_nms_vanilla(boxes, scores, idxs, iou_threshold)
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    else:
        return _batched_nms_coordinate_trick(boxes, scores, idxs, iou_threshold)