rclpy: Exception never retrieved when using MultiThreadedExecutor and Timers

Bug report

Required Info:

  • Operating System: Ubuntu 20.04.4
  • Installation type: binaries
  • Version or commit hash: foxy
  • DDS implementation: default
  • Client library (if applicable): rclpy

Steps to reproduce issue

I am encountering an issue with exceptions being thrown in timer callbacks using MultiThreadedExecutor going silent.

The sample shows that when a an exception is generated in a timer callback using the SingleThreadExecutor, implicitly I believe, the exception rises if uncaught, as expected.

When using the MultiThreadedExecutor the exception gets lost and silently fails, sometimes causing head scratching.

#!/usr/bin/env python3
import rclpy
from rclpy.executors import MultiThreadedExecutor
from rclpy.node import Node


class TestNode(Node):
    def __init__(self):
        super().__init__('zf9p')
        self.tmr = self.create_timer(1, self.tmr_callback)

    def tmr_callback(self):
        1/0
        print('Timer callback fired')


def single_executor():
    test_node = TestNode()

    while rclpy.ok() is True:
        rclpy.spin(test_node)

    test_node.destroy_timer(test_node.tmr)


def multi_executor():
    test_node = TestNode()

    # Moved to multi threaded executor so none of Actions block publishing of msgs
    executor = MultiThreadedExecutor(num_threads=1)
    executor.add_node(test_node)

    while rclpy.ok() is True:
        executor.spin()

    test_node.destroy_timer(test_node.tmr)


if __name__ == '__main__':
    rclpy.init()

    # single_executor()  # un comment to verify proper operation 
    multi_executor()

    rclpy.shutdown()

Expected behavior

Unhandled exception to rise to the top

$ python3 timer_example
Traceback (most recent call last):
  ...
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 343, in _execute_timer
    await await_or_execute(tmr.callback)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 118, in await_or_execute
    return callback(*args)
  File "timer_example", line 13, in tmr_callback
    1/0
ZeroDivisionError: division by zero

Actual behavior

Exception seems to silently fail not handled, but after 10 or 30 seconds the following message does show.

$ python3 timer_example
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero

Additional information

This looks to be similar to https://github.com/ros2/rclpy/issues/296, but not for certain.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 26 (6 by maintainers)

Commits related to this issue

Most upvoted comments

As of #1073 this is mostly fixed but there’s still a problem, which is that the exception is only raised if another callback is ready to execute. In the case of a node that only provides services, this means an exception in a service won’t be raised until someone makes another service call, which may not happen for a while.

This is mostly because wait_for_ready_callbacks blocks indefinitely and so the code to check future statuses doesn’t get to run. But also you can’t get around it by calling spin_once in a loop because the future checking doesn’t run if a TimeoutException gets raised either

@tsender @Achllle foxy backport, https://github.com/ros2/rclpy/pull/1116

if you would verify and review, that would be appreciated!

@Achllle thank you for the contribution, everything has been done including backport humble.

do you think this issue is still valid?

yes, i think this is a bug. uncaught exception should be raised to user application from executor. as i mentioned https://github.com/ros2/rclpy/issues/983#issuecomment-1271782711, this is doable, but cannot allocate bandwidth for that.

this isn’t lost as this is pretty critical functionality.

this is not. thanks for pinging, please consider the contribution if possible.

@fujitatomoya or @mr337 want to check that this isn’t lost as this is pretty critical functionality.

I am not for sure I fully understand the problem :S But I can try to attempt a PR. 😃 I should have some available cycles next week to work on this.

I need to reopen this because of https://github.com/ros2/rclpy/pull/1017

With #984 now merged, I’m going to go ahead and close this. If you feel that this was in error, please feel free to reopen.