rclcpp: Asynchronously shutting down while using `rclcpp::spin` `rclcpp:spin_some` in another thread throws an exception
This nightly failure is an example of the problem:
What is happening is that one thread is creating an executor, while other is trying to shutdown.
If shutdown is called before the creation of the Executor, an exception is thrown:
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 1
- Comments: 36 (31 by maintainers)
Quoting myself, I think this might be the right path to take. Just make it so that shutdown does not affect the function of anything, it simply changes the state of
rclcpp::Context::ok()/rclcpp::ok().This runs the risk of the user not noticing shutdown, but we can’t keep them from all pitfalls.
That’s exactly what I don’t like of how
shutdownworks right now, you only cantry/catcha genericRCLError. There should be arclcpp::exceptions::Shutdownerror (or similar).Yeap, I will check in tomorrow’s meeting if everybody agrees.
Doing this change will involve coordinated work in
rcl,rclpyandrclcppto make sure that shutdown doesn’t do more than triggering guard conditions once, invalidating the context, and calling shutdown callbacks. The only place where we should check if a context is valid or not is inrclcpp::ok(),rclpy.ok()(e.g. publish should continue working, etc).Getting the refactor right might be a bit tricky, if you’re planning to do I can help with the reviews.
Sorry, but up to now I’m the only one in favor of that change, we need to settle in what we want first. I would like to hear other opinions before starting a change that may be rejected.
@wjwwood @hidmic @jacobperron Does a custom return code in rcl and a custom exception in rclcpp for shutdown sound like a good idea to solve the issue? Do you have any other alternative?
Sorry, I think my comment wasn’t clear.
Those are two names suggestions for the same new return code.
RCL_RET_CONTEXT_INVALIDis named similarly to the other return codes.RCL_RET_IS_SHUTDOWNis a more clear name for this case IMO.We already have in
rclreturn values likeRCL_RET_NODE_INVALID,RCL_RET_PUBLISHER_INVALID, etc.After shutdown, you usually get one of those errors (e.g. https://github.com/ros2/rcl/blob/069d1f07290019b62a8297c5f95a80b58e63682a/rcl/src/rcl/publisher.c#L286). Can we add a
RCL_RET_IS_SHUTDOWNorRCL_RET_CONTEXT_INVALIDinrcland return that when appropriate?We could then add in rclcpp a
ShutdownErrorderived from RCLError and make sure we throw that one in throw_from_rcl_error. In rclpy, we should make sure to always raise KeybordInterrupt in those cases.I think that with those changes, we will be able to handle an asynchronous shutdown gracefully.
This approach requires careful reviewing of all functions in
rcllayer, to make sure they are returning the appropriate error (and careful handling of those errors in the upper client layer).Note: It’s impossible to recognize an already shutdown context of one that was never init, that should be clear in error messages.