Vitis-AI: train with caffee Vitis-AI GPU fail

Hi,

I am getting the following issue while doing train on cf_refinedet_coco_360_480_0.96_5.08G_2.0

(vitis-ai-caffe) Vitis-AI /workspace/models/AI-Model-Zoo/cf_refinedet_coco_360_480_0.96_5.08G_2.0/code/train > bash train.sh 
../../../caffe-xilinx/build/tools/caffe.bin does not exist, try use path in pre-build docker
F0303 10:14:08.370003   394 gpu_memory.cpp:171] Check failed: error == cudaSuccess (10 vs. 0)  invalid device ordinal
*** Check failure stack trace: ***
    @     0x7ff0e4aaf4dd  google::LogMessage::Fail()
    @     0x7ff0e4ab7071  google::LogMessage::SendToLog()
    @     0x7ff0e4aaeecd  google::LogMessage::Flush()
    @     0x7ff0e4ab076a  google::LogMessageFatal::~LogMessageFatal()
    @     0x7ff0e3760145  caffe::GPUMemory::Manager::update_dev_info()
    @     0x7ff0e37606bf  caffe::GPUMemory::Manager::init()
    @     0x55a72c9920ed  train()
    @     0x55a72c98ba59  main
    @     0x7ff0e1ceac87  __libc_start_main
    @     0x55a72c98c6a8  (unknown)
train.sh: line 37:   394 Aborted                 (core dumped) $exec_path "$@"

Here is the output of nvidia-smi

mhanuel@mhanuel-MSI:~$ nvidia-smi
Thu Mar  3 10:15:15 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   36C    P8    24W / 170W |    386MiB / 12288MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      7372      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A      7851      G   /usr/lib/xorg/Xorg                235MiB |
|    0   N/A  N/A      7976      G   /usr/bin/gnome-shell               40MiB |
|    0   N/A  N/A      8471      G   ...520405909793494209,131072       23MiB |
|    0   N/A  N/A    180023      G   ...AAAAAAAAA= --shared-files       39MiB |
+-----------------------------------------------------------------------------+

What could I be missing?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (8 by maintainers)

Most upvoted comments

Hi @mhanuel26 ,

I noticed that you are using GeForce RTX 3060. RTX 3060 uses the Ampere architecture, and requires at least CUDA 11.0. Unfortunately caffe can only be built with CUDA 10.0, and is not compatible with CUDA 11.0

Is there a chance that you can try with another NVIDIA GPU?