ros2cli: Nodes missing from `ros2 node list` after relaunch
Bug report
Required Info:
- Operating System:
- Ubuntu 20.04
- Installation type:
- Foxy binaries
- Version or commit hash:
- ros-foxy-navigation2 0.4.5-1focal.20201210.084248
- DDS implementation:
- Fast-RTPS (default)
- Client library (if applicable):
- n/a
Steps to reproduce issue
1
From the workspace root, launch (e.g.) a TurtleBot3 simulation:
export TURTLEBOT3_MODEL=burger
export GAZEBO_MODEL_PATH=$GAZEBO_MODEL_PATH:$(pwd)/src/turtlebot3/turtlebot3_simulations/turtlebot3_gazebo/models
ros2 launch turtlebot3_gazebo turtlebot3_world.launch.py
Then, in a second terminal, launch the navigation:
export TURTLEBOT3_MODEL=burger
ros2 launch turtlebot3_navigation2 navigation2.launch.py use_sim_time:=true
Print the node list:
ros2 node list
Close (ctrl-c) the navigation and the simulation.
2
Relaunch from the same respective terminals, the simulation:
ros2 launch turtlebot3_gazebo turtlebot3_world.launch.py
and the navigation:
ros2 launch turtlebot3_navigation2 navigation2.launch.py use_sim_time:=true
Print the node list again (2nd time):
ros2 node list
Close (ctrl-c) the navigation and the simulation. Stop the ros2 daemon:
ros2 daemon stop
3
Relaunch from the same respective terminals, the simulation:
ros2 launch turtlebot3_gazebo turtlebot3_world.launch.py
and the navigation:
ros2 launch turtlebot3_navigation2 navigation2.launch.py use_sim_time:=true
Print the node list again (3rd time):
ros2 node list
Expected behavior
The node list should be the same all three times (up to some hash in the /transform_listener_impl_... nodes).
Actual behavior
The second time, the following nodes are missing (the remainder is practically the same):
/controller_server
/controller_server_rclcpp_node
/global_costmap/global_costmap
/global_costmap/global_costmap_rclcpp_node
/global_costmap_client
/local_costmap/local_costmap
/local_costmap/local_costmap_rclcpp_node
/local_costmap_client
/planner_server
/planner_server_rclcpp_node
The third time, after stopping the daemon, it works as expected again.
Note, that everything else works fine and in case of the above navigation use case, the nodes are fully functional.
Additional information
This issue was raised here: ros-planning/navigation2#2145.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 5
- Comments: 36 (11 by maintainers)
I’m seeing something similar with gazebo + ros2_control as well.
The interesting thing is that if I do:
ros2 node listI get 0 nodes.If I do
ros2 node list --no-daemonI get the list of nodes.Restarting the daemon with
ros2 daemon stop; ros2 daemon startalso shows all nodes.I think that this is expected behavior for ros2 daemon, it is well described what-is-ros2-daemon.
I’m seeing this bug on a project with five nodes, FastRTPS, native Ubuntu install.
I’m using ros2 launch files, everything comes up nicely the first couple of times, but eventually
ros2 node liststops seeing all of the nodes (which are definitely running). At the same time,ros2 paramstops being able to interact with the hidden nodes, andros2 topic list stopsshowing all of the topics.rqt is a bit weird, there were a few time when it seemed able to find a different collection of topics and nodes to the cli tools
ros2 daemon stop; ros2 daemon starthas saved my day.I hope you guys can reproduce this issue on your machine, otherwise, nobody can help confirm even if I have a workaround patch 😄 .
I can’t use
rmw_cyclonedds_cppto reproduce this issue.for
rmw_fastrtps_cpp, as Ctrl+Cros2 launch nav2_bringup tb3_simulation_launch.py headless:=Falsecan’t make all processes exit normally, the shared-memory files used in the Fast-DDS are not clean successfully. I don’t know if it’s the root cause to make theros2 daemonnot update thenode_listener->rmw_dds_common::GraphCache::update_participant_entitiesanymore.some information about
ros2 daemonto find out the thread
3648025isId 8the backtrace for thread Id 8,
https://github.com/eProsima/Fast-DDS/blob/7e12e8fe2cebf27c621263fa544f94b099504808/src/cpp/rtps/transport/shared_mem/SharedMemChannelResource.hpp#L128-L136
failed to
Receivebypopthe message asfind_segmentthrows an exception inside.I don’t know whether it’s a bug or not because I can’t reproduce this issue the first time after clearing the related shm files
/dev/shm/*fastrtps*.This issue is not easy to reproduce.
But it must still be there because I can reproduce this issue with rolling (the reproducible steps are similar to https://github.com/ros2/ros2cli/issues/582#issue-784108824) a few times. After stopping the ros2 daemon in step 2 of https://github.com/ros2/ros2cli/issues/582#issue-784108824, we can immediately get the correct result of the node list.
Notice that the navigation demo runs well even if the
ros2 node listis incorrect.@iuhilnehc-ynos
can you evaluate 2 PRs introduced by https://github.com/ros2/rmw_fastrtps/issues/699#issuecomment-1653795722 with reproducible procedure in this issue?
downside could be discovery time for any other nodes running on that host system. daemon caches and advertises ros 2 network graph in it, then if the daemon is running, other ros 2 nodes running in the same host can find the connectivity to request the daemon without waiting entire discovery.
we can use this option to wait for ros 2 network graph updated until specific timeout expires. but this option is only valid when daemon is not running or
--no-daemonoption is specified.I have tested it on ros:rolling (docker), and build turtlebot3 and navigation2 (ros:rolling no providing nav2 packages) from sources, after testing for many times, it works well.
@iuhilnehc-ynos great news! thanks for checking.
Currently having this problem as well, but
--spin-timedoes not work for me. The only workaround that works is using the--no-daemonoption. Other commands such asros2 param listalso do not work. I’m running only a single node on humble, Ubuntu 22.04 (LTS).Restarting the daemon also does not seem to solve the problem.
No idea if it helps, but here is the output of
ros2 doctor --reportwhile my node is running:Again, not sure if helpful, but when I installed ROS2, I added the following lines to ~/.bashrc:
Press
ctrl+cforros2 launch nav2_bringup tb3_simulation_launch.py headless:=Falsehas different behavior each time, but most errors are fromrviz2andcomponent_container_isolated, which might be killed byros2 launch.It shows a random node list, but if the issue happens, the node list is almost the same as the prior while running the
tb3_simulation_launch.pyagain, but some node names with new IDs are refreshed, such as the launch node/launch_ros_{a_new_pid}.No, I tried using
fastdds shm clean, but it is not enough because shared memory files for data communication are used in the node ofros2 daemon. I must stopros2 daemon.BTW: I think it’s not difficult to reproduce this issue. Please don’t be polite to the
tb3_simulation_launch.py(Press ctrl+c any time you can to stop it and rerun it immediately). I have confirmed this issue with bothhumbleandrolling.I have not noticed this bug in Galactic, but I encountered it immediately again when I used Humble. I have seen https://github.com/ZhenshengLee/ros2_jetson/issues/10 in galactic
discovery protocol is implemented in RMW implementation, so changing rmw would solve the problem.
no i do not think so, related to previous comment, discovery depends on underneath rmw implementation.
i cannot reproduce this issue with my local environment and rolling branch.
Exactly, I’ve seen both issues.
problem-1: Cache (daemon) retaining nodes killed long ago. problem-2: Cache (daemon) not adding new nodes.
I’m trying to find reproducible examples, currently I can make it happen 100% of the time, but on a complex setup involving ros2_control with 2 controllers and launching and stopping navigation2.
There may also be underlying rmw issues causing problem-2, since I’ve seen that rviz2 would not list the topics from the newly spawned nodes, and even though I haven’t looked in depth, I believe rviz2 has 0 relation with ros2cli.
Ah, i see. you are saying
problem-1: old cache can be seen, and will not be cleaned?
problem-2: cache does not get updated?
Am i understanding correct?