rmw_fastrtps: FastRTPS 1.8.0 causes hangs in Navigation2

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04
  • Installation type:
    • Source
  • Version or commit hash:
    • 0.7.2
  • DDS implementation:
    • Fast-RTPS master branch (1.8.0)
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Currently I have to run our Navigation2 system test to reproduce this, I’m trying to find a simpler example. However what I see is that when I run our system test with the latest versions (master branches) of rmw_fastrtps (0.7.2) and Fast-RTPS (1.8.0) our test hangs and times out. When I run with the previous versions (0.7.1) and (1.7.2) respectively, things work fine. Also if I run with RMW_IMPLEMENTATION=rmw_opensplice_cpp, things work fine then too.

I haven’t been able to isolate the problem. I can provide instructions for how to reproduce using the Nav2 system test if desired.

I see AMCL is stuck waiting for data on the /scan topic, but when I do a ros2 topic hz /scan I can see that the scan topic is being published correctly by gazebo. So it’s like the callback to AMCL is not being executed. I’m not sure how to debug that, but I’m pretty sure it’s in this rmw layer.

If anyone can offer some help or suggestions as to what to look at to debug this besides the fact that the topics are being executed, I’d appreciate the help.

This is high priority as it is blocking our CI. We won’t be able to release for Dashing in this state.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 29 (18 by maintainers)

Most upvoted comments

@MiguelCompany - I tested and that change you suggested above definitely helps. I submitted a PR for it. Thanks for helping with that. I think the changes you made above helped also (eProsima/Fast-RTPS#541,).

Let me see if between those changes and this PR we get our CI to pass again and I’ll close this ticket.

We found the issue. It was related with a change necessary for the implementation of the lifespan QoS. A fix is on the way in eProsima/Fast-RTPS#541, a new blackbox test is being added in eProsima/Fast-RTPS#542, and a new unit test is under development.

I just tried the following and can confirm the hang with FastRTPS:

  • Shell A:
    • install the Dashing prerelease Debian packages (ros-dashing-*)
    • source /opt/ros/dashing/setup.bash
    • clone ros-planning/navigation2, ros-simulation/gazebo_ros_pkgs@ros2 (will be released soon) and BehaviorTree/BehaviorTree.CPP@ros2 (since its release failed to build) into a workspace ws
    • call colcon build in that workspace
  • Shell B:
    • source ws/install/setup.bash
    • ctest -V -R test_localization hangs…

@richiprosima You insight might be helpful on this ticket. This is using the latest commit from the master branch of FastRTPS.