rclcpp: Possible memory leak in spin_some()
Bug report
When using rclcpp::spin_some(node);, the node resources increase forever.
Required Info:
- Operating System:
- Ubuntu 20.04
- Installation type:
- binaries
- Version or commit hash:
- Rolling
- DDS implementation:
- Fast-RTPS or CycloneDDS
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
Compile and execute this simple node.
#include <rclcpp/rclcpp.hpp>
int main(int argc, char * argv[])
{
rclcpp::init(argc, argv);
auto node = std::make_shared<rclcpp::Node>("foo");
while(rclcpp::ok())
{
rclcpp::spin_some(node);
}
rclcpp::shutdown();
return 0;
}
Using htop you will see the resources of the process increase.
Expected behavior
The node resources should be stable.
Actual behavior
The node resources increase until the OS kills the node because of memory storage.
Out of memory: Killed process
Additional information
It can be easily bypassed by creating an executor out of the loop and spinning it inside.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 28 (16 by maintainers)
Commits related to this issue
- Possible memory leak in spin_some() https://github.com/ros2/rclcpp/issues/1638 Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> — committed to fujitatomoya/ros2_test_prover by fujitatomoya 3 years ago
Yes, that’s a good point.
The problem is that we’re storing a
std::vectorofstd::function, and all nonnullptrstd::functioncompare non-equal. i.e. we currently can’t remove a on_shutdown_callback from the vector.We could solve this by returning an identifier in
on_shutdown_callback()(instead of returning the callback that was passed), and add aremove_shutdown_callback(IdentifierT identifier)method. There are several ways to achieve that, e.g. we could instead store the callbacks in anunordered_set()and return an iterator to the added element.The cause is related to below codes
https://github.com/ros2/rclcpp/blob/b8df9347a10bff383fdc8890c5b1bfdf16430d8f/rclcpp/src/rclcpp/executor.cpp#L59-L65
In the construct of
Executor, one callback function is added toon_shutdown_callbacks_ofContextBut in the destructor of
Executor, this callback isn’t removed fromon_shutdown_callbacks_.Test codes is
Executor is created continuously, so callback is added continuously to
on_shutdown_callbacks_ofContext.About the solution, below is one way.
Change to
In the destructor of
Executor, we remove callback based on the pointer ofExecutor. But this way leads to many changes, such ashttps://github.com/ros2/rclcpp/blob/b8df9347a10bff383fdc8890c5b1bfdf16430d8f/rclcpp/src/rclcpp/context.cpp#L344-L361
In
get_on_shutdown_callbacks, we have to make a vector based on the map.Does anybody have another good idea on the solution ?
@wjwwood @clalancette @ivanpauno @fujitatomoya
Thanks for sharing your thought.
I will make this fix.
Ah, ok, then we need the handle, something like you described or return a shared_ptr to them (not ideal) like here:
https://github.com/ros2/rclcpp/blob/b8df9347a10bff383fdc8890c5b1bfdf16430d8f/rclcpp/include/rclcpp/parameter_event_handler.hpp#L202-L203
We could also just use a static
uint64_tto be a running counter, and store that value in a struct with the pointer and call that a “handle”. That has the (unlikely) rollover issue though.Yes, it’s certainly not something to do for Galactic. I only mentioned it as an improvement for the future, removing the shutdown callbacks in the destructor as suggested should fix the memory leak.
Certainly that is more efficient; you won’t have to allocate and initialize an Executor each time it is called. Given where we are in Galactic, I’m somewhat hesitant to make a change that big. As you say, it subtly changes the semantics of
rclcpp::spin_some, so it might be a surprise to existing users. And I think we can fix this particular issue without doing that, so I’ll suggest we switch to the map for Galactic, and then further discuss our options forrclcpp::spin_somein H-Turtle.As a general comment, it is still fine to break ABI for bugfixes at this time. So there is no trouble about doing that.
I need to look at this in more detail to see if I have any other ideas for solving it.
agree, this needs to be done. and
shutdown_guard_condition_is shared withweak_ptr, so ifExecutoris already gone, this callbacks does not do anything via context shutdown.i don’t have it either. (instead of suggested approach.)