rclcpp: Publishing is slow in Docker with MutliThreadedExecutor

I am using ros2 foxy from the official ros2 docker images to build ros2_control and test their new implementations. I have noticed that when starting a joint state publisher that is set to publish at 200hz, monitoring the topic with ros2 topic hz /joint_states gives at best 30hz.

I know this has nothing to do specifically with ros2_control because I had a similar issue when trying my own lifecycle publisher nodes. Basically, attaching the node to a MultiThreadedExecutor produces a similar behavior where the topic is published at a much slower rate than expected, sometime by a factor of 10. Changing to a SingleThreadedExecutor solves the issue. Problem is ros2_control relies on this MultiThreadedExecutor.

I do believe this comes from the combination of Docker and MultiThreadedExecutor. I tested it on multiple computers and got similar behavior. I haven’t been able to test on a non docker installation as it requires Ubuntu 20.04 which I don’t have. But I will try it just in case.

Steps to reproduce:

  • Use a Docker foxy image
  • Create a LifecycleNode with a publisher and attach it to a MultiThreadedExecutor
  • Monitor the topic with hz

Alternatively, follow the ros2_control_demo installed on a Docker foxy image and monitor /joint_states.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 22 (5 by maintainers)

Commits related to this issue

Most upvoted comments

It seems https://github.com/ros2/rclcpp/pull/1168 already did the same thing, so I will not create a new PR for it, <del>maybe we could push https://github.com/ros2/rclcpp/pull/1168 to be merged ASAP.</del>

Updated: It seems scheduled_timers_ and scheduled_timers_mutex_ have a long history. 👀 https://github.com/ros2/rclcpp/issues/1374#issue-714841851

  • PR that introduced the “scheduled timers” mechanism: #383
  • PR trying to improve the “scheduled timers” mechanism to avoid spurious wake ups: #621
  • PR fixing hangs caused by the “scheduler timers” mechanism: #836
  • PR (opened) to avoid threads being locked #1168 (almost reverting #621).

I have no idea of how to fix this, so I will unassign myself (cc @clalancette).

Well I have ideas of things to try, but the scope of this is much bigger of what I would expect from a randomly assigned issue. Maybe there’s an obvious fix I’m not seeing, but the whole executor code in rclcpp needs some months of love really (not particularly to fix this issue, but there’s a lot of performance and thread safety issues too). I think that @iuhilnehc-ynos comment is a good summary of what have happened here

I will try to explain how I understand things to work and why that’s an issue:

  • Each of the N threads can get the “next ready executable” in a mutually exclusive fashion. If there’s nothing ready, one of the threads will wait for more work (the others will be waiting in the mutex, as all this section is mutually exclusive).
  • Once a thread has a “ready executable”, they will try to execute it. The issue is that “executables” that are “about to be executed” are still “ready” and can be taken by another worker. https://github.com/ros2/rclcpp/pull/1241 fixed that in some cases by “pre-taking” data, but that doesn’t fix the issue for timers. To avoid the issue of timers being executed twice in a row, https://github.com/ros2/rclcpp/pull/383 was introduced. The problem there is that now a lot of the code being executed by the thread pool is mutually exclusive, and then the code introduced in #383 ends up confusing a valid timer execution with a duplicated timer execution. https://github.com/ros2/rclcpp/pull/1168 tries to relax that problem by using two mutexes, but that reintroduces the race described in https://github.com/ros2/rclcpp/pull/621.

My idea would be to limit how “executables” can be scheduled, so that when one worker has scheduled that “executable” for execution no other worker can take it as “ready”. That completely forbids the case of wanting to execute a callback of the same executable in parallel (e.g. two messages of same subscription), but I guess forbidding that is fine as that can potentially led to “out of order” execution. Anyway, that idea is also not simple to implement and I’m not completely sure that would solve the issue.


The single threaded executor also has its own problems:

@fujitatomoya

It definitely helped. Thanks. I am still a bit confused.

thread-A already has the Timer to be executed, thread-B comes in here with the same Timer and go on.

get_next_executable protected by a mutex wait_mutex_ make sure any_exec getting from two threads are different.

std::lock_guard<std::mutex> wait_lock(wait_mutex_);
get_next_executable
    get_next_ready_executable
         get_next_ready_executable_from_map
               MemoryStrategy::get_next_timer
                     set `any_exec`
                     timer_handles_.erase(it);

Anyway, I’ll see related issues later.