rmw_fastrtps: Cannot discover the endpoint after multiple nodes created and exited
Bug report
Required Info:
- Operating System:
- ubuntu 22.04
- Installation type:
- binaries, source build
- Version or commit hash:
- Humble
- DDS implementation:
- Fast-DDS
- Client library (if applicable):
- rclcpp, rclpy
Steps to reproduce issue
The following can be reproducible only if it is in the localhost. (shared memory transport is enabled)
- start the subscription.
# ros2 run examples_rclpy_minimal_subscriber subscriber_member_function
- start the multiple publishers and kill all of them after 10 seconds.
# cat > pub_nodes.launch.py << EOS
from launch import LaunchDescription
from launch_ros.actions import Node
node_num_ = 100
def generate_launch_description():
ld = LaunchDescription()
for i in range(node_num_):
ld.add_action(
Node(
package='examples_rclpy_minimal_publisher',
executable='publisher_member_function',
name='pub_node_{}'.format(i),
)
)
return ld
EOS
# ros2 launch ./pub_nodes.launch.py
...<snip>
### Send Ctrl-C after a short while (about 10 seconds) to kill all publishers
- Start a single publisher in another terminal
# ros2 run examples_rclpy_minimal_publisher publisher_member_function
- Check the subscription terminal see if that can receive the message from the publisher with procedure-3.
Expected behavior
subscription can receive the message with procedure-4.
Actual behavior
subscription cannot receive the message with procedure-4.
Additional information
- this problem cannot be observed with rmw_cyclonedds.
- this problem only can be observed with localhost communication. (via network interfaces, there is no such problem.)\
- when this issue occures,
ros2 node list --no-daemondoes not show subscriber running in first terminal. - when SHM for fastrtps disabled and UDP transport enabled, this issue does not reproduce.
<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles" >
<transport_descriptors>
<transport_descriptor>
<transport_id>CustomUdpTransport</transport_id>
<type>UDPv4</type>
</transport_descriptor>
</transport_descriptors>
<participant profile_name="participant_profile" is_default_profile="true">
<rtps>
<userTransports>
<transport_id>CustomUdpTransport</transport_id>
</userTransports>
<useBuiltinTransports>false</useBuiltinTransports>
</rtps>
</participant>
</profiles>
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 19 (15 by maintainers)
@Barry-Xu-2018 I made some bugfixes on eProsima/Fast-DDS#3759, and made the meta-traffic be transmitted on UDP by default on eProsima/Fast-DDS#3753.
With those two on a ROS 2 rolling workspace, I could not reproduce this issue anymore.
@MiguelCompany yeah, right. i will go ahead to close this one.
@fujitatomoya Patches to Fast DDS were merged, backported, and released. Do you think we can close this issue?
@fujitatomoya There’s no ABI break. We will backport both changes to the iron and humble branches
@MiguelCompany
Thank you for the further correction on this issue.
In my environment, I also cannot reproduce this issue with these 2 patches.
BTW, those 2 patches will backport to the version of FastDDS used by Humbel. Right ?
@Barry-Xu-2018 Correct. Metatraffic is basically discovery traffic.
@Barry-Xu-2018 Thank you very much for the investigation. I was getting almost to the same place on my debugging session.
I’ve been testing some changes that solve the 100% CPU usage, but they do not always solve the main issue (i.e. step 4 sometimes succeeds, and sometimes fails).
We did a workaround on https://github.com/eProsima/Fast-DDS/commit/687104a269eb58491d7d9498390dd99543ff86e9 which fixes this issue. We are considering whether to add that commit into the supported branches, but we first need to evaluate the impact of incorporating those changes.
@MiguelCompany CC: @fujitatomoya
While issue occurs,
There is one thread in the
subscriber_member_functionthat consumes almost 100% of CPU usage.abnormal CPU usage
Check the backtrace of the thread
581173backtrace
after debugging, we found that
find_segementopen a segment namedfastrtps_fed84726591b50b4all the time.but the file
fastrtps_fed84726591b50b4does not exist in/dev/shm.At this scenario,subscriber always try to open deleted shared memeory file .
BTW, we think this problem is relevant to another issue https://github.com/ros2/ros2cli/issues/582#issuecomment-1321799828.