rclpy: Exception never retrieved when using MultiThreadedExecutor and Timers
Bug report
Required Info:
- Operating System: Ubuntu 20.04.4
- Installation type: binaries
- Version or commit hash: foxy
- DDS implementation: default
- Client library (if applicable): rclpy
Steps to reproduce issue
I am encountering an issue with exceptions being thrown in timer callbacks using MultiThreadedExecutor going silent.
The sample shows that when a an exception is generated in a timer callback using the SingleThreadExecutor, implicitly I believe, the exception rises if uncaught, as expected.
When using the MultiThreadedExecutor the exception gets lost and silently fails, sometimes causing head scratching.
#!/usr/bin/env python3
import rclpy
from rclpy.executors import MultiThreadedExecutor
from rclpy.node import Node
class TestNode(Node):
def __init__(self):
super().__init__('zf9p')
self.tmr = self.create_timer(1, self.tmr_callback)
def tmr_callback(self):
1/0
print('Timer callback fired')
def single_executor():
test_node = TestNode()
while rclpy.ok() is True:
rclpy.spin(test_node)
test_node.destroy_timer(test_node.tmr)
def multi_executor():
test_node = TestNode()
# Moved to multi threaded executor so none of Actions block publishing of msgs
executor = MultiThreadedExecutor(num_threads=1)
executor.add_node(test_node)
while rclpy.ok() is True:
executor.spin()
test_node.destroy_timer(test_node.tmr)
if __name__ == '__main__':
rclpy.init()
# single_executor() # un comment to verify proper operation
multi_executor()
rclpy.shutdown()
Expected behavior
Unhandled exception to rise to the top
$ python3 timer_example
Traceback (most recent call last):
...
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 343, in _execute_timer
await await_or_execute(tmr.callback)
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 118, in await_or_execute
return callback(*args)
File "timer_example", line 13, in tmr_callback
1/0
ZeroDivisionError: division by zero
Actual behavior
Exception seems to silently fail not handled, but after 10 or 30 seconds the following message does show.
$ python3 timer_example
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
The following exception was never retrieved: division by zero
Additional information
This looks to be similar to https://github.com/ros2/rclpy/issues/296, but not for certain.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 26 (6 by maintainers)
Commits related to this issue
- Exception never retrieved when using MultiThreadedExecutor and Timers https://github.com/ros2/rclpy/issues/983 Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> — committed to fujitatomoya/ros2_test_prover by fujitatomoya 2 years ago
- Fix #983 by saving future and checking for + raising any exceptions — committed to Achllle/rclpy by Achllle a year ago
- Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achllle <achille.verheye@gmail.com> — committed to Achllle/rclpy by Achllle a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.verheye@gm... — committed to ros2/rclpy by Achllle a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.verheye@gmai... — committed to ros2/rclpy by Achllle a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.verheye@gmai... — committed to ros2/rclpy by Achllle a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.verheye@gmai... — committed to fujitatomoya/rclpy by Achllle a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) (#1088) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.ve... — committed to ros2/rclpy by fujitatomoya a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) (#1088) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.verh... — committed to iuhilnehc-ynos/rclpy by fujitatomoya a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) (#1088) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.verh... — committed to iuhilnehc-ynos/rclpy by fujitatomoya a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) (#1088) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.verh... — committed to iuhilnehc-ynos/rclpy by fujitatomoya a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) (#1088) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.ve... — committed to Barry-Xu-2018/rclpy by fujitatomoya a year ago
- Rebase humble/pr-1033 (#1107) * Add parallel callback test (#1044) (#1052) * Add parallel callback test Signed-off-by: Florian Vahl <florian@flova.de> (cherry picked from commit 40054baf84ca440e... — committed to ros2/rclpy by Barry-Xu-2018 a year ago
- Rebase humble/pr-1033 (#1107) * Add parallel callback test (#1044) (#1052) * Add parallel callback test Signed-off-by: Florian Vahl <florian@flova.de> (cherry picked from commit 40054baf84ca440e... — committed to ros2/rclpy by Barry-Xu-2018 a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.verheye@gmai... — committed to ros2/rclpy by Achllle a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.verheye@gmai... — committed to ros2/rclpy by Achllle a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) * Fix #983 by saving future and checking for + raising any exceptions Signed-off-by: Achille Verheye <achille.verheye@gmai... — committed to ros2/rclpy by Achllle a year ago
- Fix #983 by saving future and checking for + raising any exceptions (#1073) (#1116) * Fix #983 by saving future and checking for + raising any exceptions (cherry picked from commit 933ec536fd0... — committed to ros2/rclpy by mergify[bot] a year ago
As of #1073 this is mostly fixed but there’s still a problem, which is that the exception is only raised if another callback is ready to execute. In the case of a node that only provides services, this means an exception in a service won’t be raised until someone makes another service call, which may not happen for a while.
This is mostly because
wait_for_ready_callbacksblocks indefinitely and so the code to check future statuses doesn’t get to run. But also you can’t get around it by callingspin_oncein a loop because the future checking doesn’t run if aTimeoutExceptiongets raised either@tsender @Achllle foxy backport, https://github.com/ros2/rclpy/pull/1116
if you would verify and review, that would be appreciated!
@Achllle thank you for the contribution, everything has been done including backport humble.
yes, i think this is a bug. uncaught exception should be raised to user application from executor. as i mentioned https://github.com/ros2/rclpy/issues/983#issuecomment-1271782711, this is doable, but cannot allocate bandwidth for that.
this is not. thanks for pinging, please consider the contribution if possible.
@fujitatomoya or @mr337 want to check that this isn’t lost as this is pretty critical functionality.
I am not for sure I fully understand the problem :S But I can try to attempt a PR. 😃 I should have some available cycles next week to work on this.
I need to reopen this because of https://github.com/ros2/rclpy/pull/1017
With #984 now merged, I’m going to go ahead and close this. If you feel that this was in error, please feel free to reopen.