rclcpp: Executor quick destruction after spin hangs
Bug report
Required Info:
- Operating System: Ubuntu 20.04
- Installation type: Binaries
- Version or commit hash: 5.0.0-1focal.20201007.211227
- DDS implementation: Fast DDS
- Client library (if applicable): rclcpp
Steps to reproduce issue
TEST(Foo, bar) {
auto node = std::make_shared<rclcpp::Node>("test");
auto executor = std::make_shared<rclcpp::executors::SingleThreadedExecutor>();
executor->add_node(node);
auto executor_spin_future = std::async(
std::launch::async, [&executor]() -> void {
executor->spin();
});
// std::this_thread::sleep_for(50ms);
executor->cancel();
}
Expected behavior
That the test ends.
Actual behavior
Very frequently, the test hangs on the future destruction, waiting for thread join();
Because the thread is stuck on executor->spin() (wait_for_work exactly)
Additional information
Uncommenting the sleep_for fixes the issue on my machine, so there seems to be a race condition.
In our use case, we have test than end faster than that time and trigger this issue, we can add the sleep as a workaround but it’s ugly: https://github.com/ros-controls/ros2_control/pull/234/files/833b1661209cc12a9d6f9ef45c87f4072e641aa7#diff-28c9aa0318c48cffb66c294bbf17394b6f9295bb8b9fb83d2cac539699f7e354R130
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 18 (16 by maintainers)
I’m not pretty sure what we want to do here, so resolving the issue involves first discussing what’s desired. After that is resolved, implementing it shouldn’t be a problem, but the
good first issuelabel doesn’t feel accurate at the moment. If you have any opinion of how it should work feel free to chime in.I will try to reread the comments again in a few days and post something.
I also feel the discussion somewhat overlaps with how
rclcpp::shutdown()should work (https://github.com/ros2/rclcpp/issues/1139), though it’s not exactly the same issue.The semantics of
cancelis strange:spinis already running, it will return. But you can spin again afterwards, and those will continue blocking.cancelthat happens while the executor is not spinning doesn’t have any effect (the firstwaitof the executor will be awaken immediately because a guard condition was triggered bycancel, butspinwon’t return and will continue doing work).rclpyExecutor doesn’t have acancelmethod, but rather a shutdown one.spin()calls after shutdown happened will not do work.I’m not sure what is the intended semantics of
cancel, but its current behavior doesn’t sound quite useful (it will always be racy).Maybe instead of that, the first
spinafter a cancel should not do work. The other option is to make it behave likerclpy.Executor.shutdown.@wjwwood any ideas?