autoware: CUDA environment is broken when I run a docker container with rocker

Checklist

  • I’ve read the contribution guidelines.
  • I’ve searched other issues and no duplicate issues were found.
  • I’m convinced that this is not my fault but a bug.

Description

When I run a docker container built on this repository with rocker, nvidia-smi and CUDA packages of autoware.universe didn’t work.

Expected behavior

$ rocker --nvidia --x11 --user ghcr.io/autowarefoundation/autoware-universe:humble-latest nvidia-smi

returns the same result as

docker run --rm -it --gpus all -e DISPLAY -e TERM -e QT_X11_NO_MITSHM=1 -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/localtime:/etc/localtime:ro ghcr.io/autowarefoundation/autoware-universe:humble-latest

Actual behavior

$ rocker --nvidia --x11 --user --volume $PWD:$HOME/autoware -- ghcr.io/autowarefoundation/autoware-universe:humble-latest nvidia-smi
...
bash: nvidia-smi: command not found

or

# in the docker container
$ ros2 launch lidar_centerpoint lidar_centerpoint.launch.xml
[INFO] [launch]: All log files can be found below /home/yusuke/.ros/log/2022-05-27-18-24-13-889706-yusuke-desktop-14888
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [lidar_centerpoint_node-1]: process started with pid [14889]
[lidar_centerpoint_node-1] terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
[lidar_centerpoint_node-1]   what():  std::bad_alloc: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version
[ERROR] [lidar_centerpoint_node-1]: process has died [pid 14889, exit code -6, cmd '/home/yusuke/autoware/install/lidar_centerpoint/lib/lidar_centerpoint/lidar_centerpoint_node --ros-args -r __node:=lidar_centerpoint --params-file /tmp/launch_params__7bviznb --params-file /tmp/launch_params_xhbpo0pj --params-file /tmp/launch_params_3cva1bu7 --params-file /tmp/launch_params_591wukuf --params-file /tmp/launch_params_binyjp_2 --params-file /tmp/launch_params_d_ubfmz8 --params-file /tmp/launch_params_3ciwtkcg --params-file /tmp/launch_params_cy1qmkld --params-file /home/yusuke/autoware/install/lidar_centerpoint/share/lidar_centerpoint/config/default.param.yaml -r ~/input/pointcloud:=/sensing/lidar/pointcloud -r ~/output/objects:=objects'].

Steps to reproduce

  1. build a docker image in docker directory
  2. run a docker container with rocker
  3. nvidia-smi

Versions

No response

Possible causes

No response

Additional context

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (16 by maintainers)

Most upvoted comments

Sent a PR: https://github.com/osrf/rocker/pull/182

If it’s not accepted, I’ll add the following block in the Dockerfile.


## Set env for nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

@kenji-miyake I’m sorry for the late replay. AND thank you for identifying the cause of this issue! I will close this issue after creating a PR to add your suggestion into the document.

Yes indeed, my latest is older than galactic-latest:

ghcr.io/autowarefoundation/autoware-universe               galactic-latest                                                          ea6b0e03f964   5 weeks ago     5.74GB
ghcr.io/autowarefoundation/autoware-universe               latest                                                                   35694ec40e8f   8 weeks ago     16.2GB