rmw_cyclonedds: Program termination with Ctrl+C stops in the middle.
Bug report
- CPU: Intel® Core™ i5-3340M CPU @ 2.70GHz
- Operating System:
- Ubuntu 20.04
- Installation type:
- from source
- Version or commit hash:
- DDS implementation:
- cyclonedds ONLY. rmw_fastrtps_cpp did not cause.
- Client library (if applicable):
- rclcpp or rclpy
Steps to reproduce issue
This issue occurred in both rclcpp and rclpy.
$ ros2 run demo_nodes_cpp talker --ros-args --log-level debug
# send Ctrl+C
$ ros2 run demo_nodes_py talker --ros-args --log-level debug
# send Ctrl+C
Expected behavior
terminate safely.
Actual behavior
stop during finalizing state. It seems like node finalizing has some problem.
$ ros2 run demo_nodes_cpp talker --ros-args --log-level debug
^C[INFO] [1611636125.400290516] [rclcpp]: signal_handler(signal_value=2)
[DEBUG] [1611636125.400375072] [rclcpp]: signal_handler(): SIGINT received, notifying deferred signal handler
[DEBUG] [1611636125.400436017] [rclcpp]: deferred_signal_handler(): woken up due to SIGINT or uninstall
[DEBUG] [1611636125.400486808] [rclcpp]: deferred_signal_handler(): SIGINT received, shutting down
[DEBUG] [1611636125.400651593] [rclcpp]: deferred_signal_handler(): shutting down rclcpp::Context @ 0x55d466e8e830, because it had shutdown_on_sigint == true
[DEBUG] [1611636125.400687545] [rcl]: Shutting down ROS client library, for context at address: 0x55d466e8eab0
[DEBUG] [1611636125.400782572] [rcl]: Guard condition in wait set is ready
[DEBUG] [1611636125.400830210] [rcl]: Finalizing publisher
[DEBUG] [1611636125.401084234] [rcl]: Timer canceled
[DEBUG] [1611636125.401109610] [rcl]: Timer canceled
[DEBUG] [1611636125.401343531] [rcl]: Publisher finalized
[DEBUG] [1611636125.401495158] [rclcpp]: deferred_signal_handler(): waiting for SIGINT or uninstall
[DEBUG] [1611636125.401520824] [rcl]: Finalizing event
[DEBUG] [1611636125.401541437] [rcl]: Event finalized
[DEBUG] [1611636125.401579052] [rcl]: Finalizing publisher
[DEBUG] [1611636125.401832104] [rcl]: Publisher finalized
[DEBUG] [1611636125.402000585] [rcl]: Finalizing event
[DEBUG] [1611636125.402013763] [rcl]: Event finalized
[DEBUG] [1611636125.402026546] [rcl]: Finalizing subscription
[DEBUG] [1611636125.402194399] [rcl]: Subscription finalized
[DEBUG] [1611636125.402357121] [rcl]: Finalizing service
[DEBUG] [1611636125.404965197] [rcl]: Service finalized
[DEBUG] [1611636125.405047754] [rcl]: Finalizing service
[DEBUG] [1611636125.405435803] [rcl]: Service finalized
[DEBUG] [1611636125.405483505] [rcl]: Finalizing service
[DEBUG] [1611636125.405815296] [rcl]: Service finalized
[DEBUG] [1611636125.405853974] [rcl]: Finalizing service
[DEBUG] [1611636125.406083684] [rcl]: Service finalized
[DEBUG] [1611636125.406120386] [rcl]: Finalizing service
[DEBUG] [1611636125.406316801] [rcl]: Service finalized
[DEBUG] [1611636125.406351774] [rcl]: Finalizing service
[DEBUG] [1611636125.406559456] [rcl]: Service finalized
[DEBUG] [1611636125.406635210] [rcl]: Finalizing event
[DEBUG] [1611636125.406648754] [rcl]: Event finalized
[DEBUG] [1611636125.406664502] [rcl]: Finalizing publisher
[DEBUG] [1611636125.406853173] [rcl]: Publisher finalized
[DEBUG] [1611636125.407209588] [rcl]: Finalizing node
$ ros2 run demo_nodes_py talker --ros-args --log-level debug
^C[DEBUG] [1611636186.002509447] [rcl]: Guard condition in wait set is ready
[DEBUG] [1611636186.002854722] [rcl]: Finalizing event
[DEBUG] [1611636186.002893463] [rcl]: Event finalized
[DEBUG] [1611636186.002922664] [rcl]: Finalizing publisher
[DEBUG] [1611636186.003449009] [rcl]: Publisher finalized
[DEBUG] [1611636186.003530384] [rcl]: Finalizing event
[DEBUG] [1611636186.003552872] [rcl]: Event finalized
[DEBUG] [1611636186.003578862] [rcl]: Finalizing publisher
[DEBUG] [1611636186.004997261] [rcl]: Publisher finalized
[DEBUG] [1611636186.005102081] [rcl]: Finalizing service
[DEBUG] [1611636186.006543735] [rcl]: Service finalized
[DEBUG] [1611636186.006641986] [rcl]: Finalizing service
[DEBUG] [1611636186.006956990] [rcl]: Service finalized
[DEBUG] [1611636186.007011764] [rcl]: Finalizing service
[DEBUG] [1611636186.008383436] [rcl]: Service finalized
[DEBUG] [1611636186.008476722] [rcl]: Finalizing service
[DEBUG] [1611636186.009434476] [rcl]: Service finalized
[DEBUG] [1611636186.009525846] [rcl]: Finalizing service
[DEBUG] [1611636186.010021857] [rcl]: Service finalized
[DEBUG] [1611636186.010073573] [rcl]: Finalizing service
[DEBUG] [1611636186.011492397] [rcl]: Service finalized
[DEBUG] [1611636186.011578826] [rcl]: Timer canceled
[DEBUG] [1611636186.011635771] [rcl]: Finalizing node
[DEBUG] [1611636186.011656284] [rcl]: Finalizing publisher
[DEBUG] [1611636186.011833613] [rcl]: Publisher finalized
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (9 by maintainers)
@ivanpauno @hsgwa The way I read it, the problem was solved with changing the firewall settings, and it is not really a bug in Cyclone if it can’t receive data because of a firewall. Hanging during shutdown is a consequence of that.
I do want everything to work fine within a machine if at least the loopback interface works, and so I am working on supporting using multiple network interfaces simultaneously.
The interesting options as I see it are:
Internal/multipleReceiveThread=falseto prevent the hang.Thanks!
I reviewed network settings and found out that firewall was the cause. This issue can be reproduced as following steps.
Select interface to Ethernet
Prepare XML
Set CYCLONEDDS_URI
Enable firewall
Run demo
I tried to change cyclonedds configure, and these are results.
I apologize in advance if this has already been documented about FIrewall settings.
@iuhilnehc-ynos yes that’s the path, at least for the initial “trigger” message. https://github.com/eclipse-cyclonedds/cyclonedds/blob/81666fd987cbb2a7a61f24c96be923c0cd1b2995/src/core/ddsi/src/q_init.c#L778 schedules a periodic event to try again (https://github.com/eclipse-cyclonedds/cyclonedds/blob/81666fd987cbb2a7a61f24c96be923c0cd1b2995/src/core/ddsi/src/q_init.c#L784). The purpose of the packet is to unblock the
recvmsgcall, the content of the packet doesn’t matter, it just needs to return fromrecvmsg.UDP/IP is unreliable and so it retries until the threads terminate, but as far as I know, the loopback interface is effectively reliable provided the socket receive buffer has space. If there is something present in the receive buffer,
recvmsgshould return it (and no matter what the contents are, it is sufficient to make it terminate); if there’s nothing present, surely this packet should be delivered.I’ve seen it hang in this way before, even on machines doing absolutely nothing else (so no chance of socket buffer being full and the IP stack dropping the packet for that reason), but because of some routing issue that made it impossible for a process to send a packet to its own sockets. I have no clue why such a routing issue would exist …
It is hanging because it is not succeeding in waking up a receive thread by sending a datagram to it: in principle, Cyclone can have any number of threads for receiving data from the network, and each thread handles either a single socket or multiple sockets. If it is the former, it enters a blocking wait, and then termination requires receiving a datagram to unblock the
recvmsg()call. For those threads it sends itself a minimalistic datagram and retries every second.From the stack traces, you can tell it is one of those threads, and you can also see its timed-event handling being in a state consistent with those retries:
On Linux, by default, it creates 3 threads: one that handles all discovery traffic (multicast+unicast, so 2 sockets and exempt from this termination procedure), one that handles unicast application data (single socket), and one that handles multicast application data (also single socket). Chances are it is the multicast one here, but I can’t be quite certain.
Not being able to send datagrams to oneself is probably a network configuration issue, but I don’t know what bit of is causing the problem. So it is probably sensible to avoid having to do this by not having any receive thread handling a single socket. For that, set Internal/MultipleReceiveThreads to
false.@ivanpauno
here it is. I added version info of related packages to the original post.