SuperPoint-SuperGlue-TensorRT: about crash problem in SuperPoint::build()

crash problem in SuperPoint::build()

Hello! I try to configure this project, and run into some error, hope for suggestion! I am PhD student from Shanghai Jiao Tong University.

my envs is

because I have not configure the cuda-11.6, but I update TensorRT to TensorRT-8.4.1.5

cuda-10.2
TensorRT-8.4.1.5
cuDNN: v8.2.0

And I can build the project done successfully

CMakeLists.txt

my modify the CMakeLists.txt as following:

cmake_minimum_required(VERSION 3.5)
project(superpointglue)

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_BUILD_TYPE "release")
add_definitions(-w)

set(ENABLE_BACKWARD true)
if(ENABLE_BACKWARD)
    add_definitions(-D USE_BACKWARD)
endif()

add_subdirectory(${PROJECT_SOURCE_DIR}/3rdparty/tensorrtbuffer)

set(TENSORRT_ROOT $ENV{HOME}/3rdParty/TensorRT-8.4.1.5)
SET(CUDA_TOOLKIT_ROOT_DIR "/usr/local/cuda")
set(Torch_DIR "$ENV{HOME}/3rdParty/libtorch/share/cmake/Torch") # zph desktop
# find_package(OpenCV 4.2 REQUIRED) # origin
find_package(OpenCV 3.4.10 REQUIRED) # wzy
find_package(Eigen3 REQUIRED)
find_package(Torch REQUIRED) # wzy
find_package(CUDA REQUIRED)
find_package(yaml-cpp REQUIRED)


include_directories(
  ${PROJECT_SOURCE_DIR}
  ${PROJECT_SOURCE_DIR}/include
  ${OpenCV_INCLUDE_DIRS}
  ${EIGEN3_INCLUDE_DIRS}
  ${CUDA_INCLUDE_DIRS}
  ${YAML_CPP_INCLUDE_DIR}
  ${TORCH_INCLUDE_DIRS} # wzy
  ${TENSORRT_ROOT}/include # wzy
)

add_library(${PROJECT_NAME}_lib SHARED
  src/super_point.cpp
  src/super_glue.cpp
)

target_link_libraries(${PROJECT_NAME}_lib
  nvinfer
  nvonnxparser
  ${OpenCV_LIBRARIES}
  ${CUDA_LIBRARIES}
  yaml-cpp
  tensorrtbuffer
#  ${TENSORRT_ROOT}/lib # wzy
#  ${TORCH_LIBRARIES} # wzy
)

add_executable(${PROJECT_NAME}_image inference_image.cpp)
add_executable(${PROJECT_NAME}_sequence inference_sequence.cpp)

target_link_libraries(${PROJECT_NAME}_image ${PROJECT_NAME}_lib)
target_link_libraries(${PROJECT_NAME}_sequence  ${PROJECT_NAME}_lib)

if (ENABLE_BACKWARD)
    target_link_libraries(${PROJECT_NAME}_image dw)
endif()

I configure this and get the following

-- Found CUDA: /usr/local/cuda-10.2 (found version "10.2") 
-- Found CUDA: /usr/local/cuda (found suitable exact version "10.2") 
-- Found CUDA: /usr/local/cuda (found version "10.2") 
-- Caffe2: CUDA detected: 10.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 10.2
-- Found cuDNN: v8.2.0  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
-- /usr/local/cuda/lib64/libnvrtc.so shorthash is 08c4863f
-- Autodetected CUDA architecture(s):  7.5
-- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
-- Configuring done
-- Generating done

Problem reproduce

problem 1

run ./superpointglue_image ../config/config.yaml ../weights/ ${PWD}/../image/image0.png ${PWD}/../image/image1.png get the following bug

Config file is ../config/config.yaml
First image size: 320x240
Second image size: 320x240
Building inference engine......
SuperPoint and SuperGlue inference engine build success.
---------------------------------------------------------
[SuperPoint::infer]
Stack trace (most recent call last):
#4    Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in 
#3    Object "./superpointglue_image", at 0x5631db9fecd9, in 
#2    Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7fd36e921c86, in __libc_start_main
#1    Object "./superpointglue_image", at 0x5631db9fe8b8, in 
#0    Object "/home/zph/projects/SuperPoint-SuperGlue-TensorRT/build/libsuperpointglue_lib.so", at 0x7fd372a505ce, in SuperPoint::infer(cv::Mat const&, Eigen::Matrix<double, 259, -1, 0, 259, -1>&)
Segmentation fault (Signal sent by the kernel [(nil)])
[1]    26575 segmentation fault (core dumped)  ./superpointglue_image ../config/config.yaml ../weights/

problem 2

when I delete the .engine and try to set 640*480 image in config and rerun ./superpointglue_image ../config/config.yaml ../weights/ ${PWD}/../image/image0.png ${PWD}/../image/image1.png I get following bug

First image size: 640x480
Second image size: 640x480
Building inference engine......
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Stack trace (most recent call last):
#12   Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in 
#11   Object "./superpointglue_image", at 0x56016a9fecd9, in 
#10   Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f85ea947c86, in __libc_start_main
#9    Object "./superpointglue_image", at 0x56016a9fe763, in 
#8    Object "/home/zph/projects/SuperPoint-SuperGlue-TensorRT/build/libsuperpointglue_lib.so", at 0x7f85eea7393b, in SuperPoint::build()
#7    Object "/home/zph/projects/SuperPoint-SuperGlue-TensorRT/build/libsuperpointglue_lib.so", at 0x7f85eea73421, in SuperPoint::deserialize_engine()
#6    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f85eafc22db, in operator new(unsigned long)
#5    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f85eafc1d53, in __cxa_throw
#4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f85eafc1b20, in std::terminate()
#3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f85eafc1ae5, in 
#2    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f85eafbb956, in 
#1    Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f85ea9667f0, in abort
#0    Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f85ea964e87, in gsignal
Aborted (Signal sent by tkill() 27576 1000)
[1]    27576 abort (core dumped)  ./superpointglue_image ../config/config_new.yaml ../weights/

Try on Docker

I try the docker command, and get:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.6, please update your driver to a newer version, or use an earlier cuda container: unknown.
ERRO[0000] error waiting for container:

Question

First question is cuda-11.6 is not includeed in docker image?
Second quesion is cuda-11.6 is a really prerequisite condition?

About this issue

Original URL
State: closed
Created a year ago
Comments: 15 (4 by maintainers)

Most upvoted comments

@myboyhood Hi, thank you for your interest in my project. For the first question, the docker image already has a cuda environment, but you need to make sure that the Nvidia driver on the host matches the cuda version in docker. unsatisfied condition: cuda>=11.6, please update your driver to a newer version seems to remind you to update the Nvidia driver on the host. For the second question, since I only verified on tensorrt 8.4.1.5, you only need to install cuda versions that match tensorrt 8.4.1.5, and version matching information is available on the Nvidia website.

yuefanhao on Feb 22, 2023

@myboyhood Hi, I am sorry that I have been busy recently and have no time to focus on the problem list. In fact, the model supports dynamically sized image inputs. Changing the image size or the maximum number of feature points can be done without regenerating the engine. Larger resolution images might lead to better features and matches? I’m not sure about that. I haven’t tried that.

yuefanhao on Mar 10, 2023

But the onnx file should be generic.

yuefanhao on Feb 23, 2023

@myboyhood I’m glad you’ve fixed the problem, and yes, if you switch to a new platform, you’d better start generating new engine files.

yuefanhao on Feb 23, 2023