apollo: Perception module won't start with RTX 2080 Ti

Per @lemketron and @storypku’s discussion here, I’m opening a new issue about the Perception module not working on my machine, which has an RTX 2080 Ti. Please let me know if there’s any additional information I can provide to help debug the problem.

Describe the bug The Perception module will not start. When activated in Dreamview, the slider moves immediately back to “off”. No perception.INFO file is produced at all.

To Reproduce Here are the steps I took:

  1. Start and enter Docker container.
  2. ./scripts/bootstrap_lgsvl.sh
  3. ./scripts/bridge.sh
  4. Start LGSVL Simulator (outside the container) and begin simulation (I used Borregas Ave.).
  5. In Dreamview, select mode, map, and vehicle.
  6. In Dreamview, turn on Localization and Transform modules (map then appeared correctly in Dreamview).
  7. Turn on Perception module (module did not start, as described above).

System setup:

  • OS: Ubuntu 18.04.5 LTS
  • GPU: NVIDIA RTX 2080 Ti
  • Apollo version: built from master branch, commit dbc9f1ba87eb389f3358edb843956d80bd478383

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 2
  • Comments: 45 (38 by maintainers)

Most upvoted comments

I got the same errors when running Lidar-perception. I found out one of the following two changes can make it work:

  1. Increase the TF buffer size (https://github.com/ApolloAuto/apollo/blob/master/modules/perception/production/conf/perception/perception_common.flag#L69) from 0.01 to 0.1

This (mostly) worked for me. It got perception to stop crashing immediately on launch. However, I noticed that LiDAR perception is taking 4GB of GPU memory and we don’t even have camera perception (traffic light) going…

I hope camera perception will be working again soon, and hope it still uses less GPU memory than the LiDAR one (in 5.0, it uses 1165MB, and traffic light adds another 1088MB for a total of 2253MB, just over half what LiDAR perception alone consumes in master now).

In any case I’m happy to report that I’m actually seeing Apollo master able to drive (on small maps, without traffic light perception) in LGSVL Simulator on a Razer Blade i7 laptop with a 8GB RTX 2070 MaxQ. 😃

Thanks @dfremont @rongguodong . The team is looking into this issue: @storypku @jeroldchen @lfcarol

@storypku It is simple to reproduce this locally: basically, just follow our instructions here until the step saying “Open the Module Controller tap”. Then, turn on “localization” and “transform” modules. Next, using “cyber_launch start modules/perception/production/launch/lidar_perception.launch” to start lidar-based perception module.

Now, you should see lots of errors about timestamp in the console (as @dfremont posted above).

Note that this may be an old issue. We do not have this error in our fork of Apollo 5.0 because we applied the second change I posted above (i.e. changed the querry_time from “timestamp” to “0”). But this change seems to be hacky. I would like to understand the reason behind it and find out a better solution.

Thank Guodong for the updates. We will check it asap.

@storypku It is simple to reproduce this locally: basically, just follow our instructions here until the step saying “Open the Module Controller tap”. Then, turn on “localization” and “transform” modules. Next, using “cyber_launch start modules/perception/production/launch/lidar_perception.launch” to start lidar-based perception module.

Now, you should see lots of errors about timestamp in the console (as @dfremont posted above).

Note that this may be an old issue. We do not have this error in our fork of Apollo 5.0 because we applied the second change I posted above (i.e. changed the querry_time from “timestamp” to “0”). But this change seems to be hacky. I would like to understand the reason behind it and find out a better solution.

BTW: The latest master seems missing environment setting. I have to do “source cyber/setup.bash” before I can run any cyber commands (e.g. cyber_monitor, cyber_launch, etc.). Is this a new bug?