rclcpp: Possible memory leak in spin_some()

Bug report

When using rclcpp::spin_some(node);, the node resources increase forever.

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • binaries
  • Version or commit hash:
    • Rolling
  • DDS implementation:
    • Fast-RTPS or CycloneDDS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Compile and execute this simple node.

#include <rclcpp/rclcpp.hpp>
int main(int argc, char * argv[])
{
  rclcpp::init(argc, argv);
  auto node = std::make_shared<rclcpp::Node>("foo");
  while(rclcpp::ok())
  {
    rclcpp::spin_some(node);
  }

  rclcpp::shutdown();
  return 0;
}

Using htop you will see the resources of the process increase.

Expected behavior

The node resources should be stable.

Actual behavior

The node resources increase until the OS kills the node because of memory storage. Out of memory: Killed process

Additional information

It can be easily bypassed by creating an executor out of the loop and spinning it inside.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 28 (16 by maintainers)

Commits related to this issue

Most upvoted comments

I do not think it is a good idea to have Executor be known to the context…

Yes, that’s a good point.

Instead we could use get_on_shutdown_callbacks() to get the vector and remove the callback in the destructor of the executor (it returns a non-const vector).

The problem is that we’re storing a std::vector of std::function, and all non nullptr std::function compare non-equal. i.e. we currently can’t remove a on_shutdown_callback from the vector.

We could solve this by returning an identifier in on_shutdown_callback() (instead of returning the callback that was passed), and add a remove_shutdown_callback(IdentifierT identifier) method. There are several ways to achieve that, e.g. we could instead store the callbacks in an unordered_set() and return an iterator to the added element.

The cause is related to below codes

https://github.com/ros2/rclcpp/blob/b8df9347a10bff383fdc8890c5b1bfdf16430d8f/rclcpp/src/rclcpp/executor.cpp#L59-L65

In the construct of Executor, one callback function is added to on_shutdown_callbacks_ of Context
But in the destructor of Executor, this callback isn’t removed from on_shutdown_callbacks_.

Test codes is

  while(rclcpp::ok())
  {
    rclcpp::spin_some(node);
  }

Executor is created continuously, so callback is added continuously to on_shutdown_callbacks_ of Context.

About the solution, below is one way.

std::vector<OnShutdownCallback> on_shutdown_callbacks_;

Change to

std::map<Executor *, OnShutdownCallback> on_shutdown_callbacks_;

In the destructor of Executor, we remove callback based on the pointer of Executor. But this way leads to many changes, such as

https://github.com/ros2/rclcpp/blob/b8df9347a10bff383fdc8890c5b1bfdf16430d8f/rclcpp/src/rclcpp/context.cpp#L344-L361

In get_on_shutdown_callbacks, we have to make a vector based on the map.

Does anybody have another good idea on the solution ?

@wjwwood @clalancette @ivanpauno @fujitatomoya

Thanks for sharing your thought.
I will make this fix.

The problem is that we’re storing a std::vector of std::function, and all non nullptr std::function compare non-equal. i.e. we currently can’t remove a on_shutdown_callback from the vector.

Ah, ok, then we need the handle, something like you described or return a shared_ptr to them (not ideal) like here:

https://github.com/ros2/rclcpp/blob/b8df9347a10bff383fdc8890c5b1bfdf16430d8f/rclcpp/include/rclcpp/parameter_event_handler.hpp#L202-L203

We could also just use a static uint64_t to be a running counter, and store that value in a struct with the pointer and call that a “handle”. That has the (unlikely) rollover issue though.

Certainly that is more efficient; you won’t have to allocate and initialize an Executor each time it is called. Given where we are in Galactic, I’m somewhat hesitant to make a change that big

Yes, it’s certainly not something to do for Galactic. I only mentioned it as an improvement for the future, removing the shutdown callbacks in the destructor as suggested should fix the memory leak.

I think rclcpp::spin_some() and alike should use a global executor instance instead of creating and destroying an executor each time (caveat: you cannot call rclcpp::spin_some() concurrently in that case).

Certainly that is more efficient; you won’t have to allocate and initialize an Executor each time it is called. Given where we are in Galactic, I’m somewhat hesitant to make a change that big. As you say, it subtly changes the semantics of rclcpp::spin_some, so it might be a surprise to existing users. And I think we can fix this particular issue without doing that, so I’ll suggest we switch to the map for Galactic, and then further discuss our options for rclcpp::spin_some in H-Turtle.

As a general comment, it is still fine to break ABI for bugfixes at this time. So there is no trouble about doing that.

I need to look at this in more detail to see if I have any other ideas for solving it.

In the destructor of Executor, we remove callback based on the pointer of Executor.

agree, this needs to be done. and shutdown_guard_condition_ is shared with weak_ptr, so if Executor is already gone, this callbacks does not do anything via context shutdown.

Does anybody have another good idea on the solution ?

i don’t have it either. (instead of suggested approach.)