franka_ros: No error raised when controller stop working

Context

System version: 5.2.1 libfranka version: 0.10.0 franka_ros version: 0.10.2 (unreleased), on commit 2d458abad7390bb73771c91787c361c2b5cfa6ad, one commit behind the state of develop when this issue was made. controllers used: position_joint_trajectory_controller and a custom controller . We switch from one to another during operation regularly. The “controller” used to interface with the robot state is used. MoveIt related packages: moveit: at commit f94f958337d4791a13ba48de25bac9c9c60d3ce4 moveit_msgs: at commit 2b9c921f0e519704f898ebaec1663a4dd25b5203 moveit_resources: at commit 0be98e1b263e823fa41b653359da582dc585c827 geometric_shapes: at commit 0991e05d2ddc8007691b3ee907814f3f95f111f7 rviz_visual_tools: at commit f94f958337d4791a13ba48de25bac9c9c60d3ce4 moveit_visual_tools: at commit f9585abc4900b998262a4abb6b355b4d55c65dfb moveit_tutorials: at commit 71ab5ee7d4c9db9b67363b6abb385fe33cd02ddb panda_moveit_config: at commit 625be99465d1cf897297f0fde414f90349e3015f moveit_calibration: at commit ff6918e380791d82ff859b251297ac36bf1e1b2b

Problem

Sometimes the position_joint_trajectory_controller controller stops working (most of the cases after a controller switch) and, to our best knowledge, it is not reported as such. When this happens:

No error is raised
Nothing abnormal is logged to the terminal
/franka_state_controller/franka_states/robot_mode is 2
calling controller_manager/list_controllers says the position_joint_trajectory_controller is running

But whenever a new trajectory is sent to move_group (moveit node) for the exectution, the controller is unresponsive and the trajectory execution fails with the following errors:

[ WARN] [1677695839.528682019] [/move_group]: Controller 'position_joint_trajectory_controller' failed with error GOAL_TOLERANCE_VIOLATED: panda_joint2 goal error -0.160081
[ WARN] [1677695839.528764014] [/move_group]: Controller handle position_joint_trajectory_controller reports status ABORTED
[ INFO] [1677695839.528796047] [/move_group]: Completed trajectory execution with status ABORTED ...
[ INFO] [1677695839.528885006] [/move_group]: Execution completed: ABORTED
[ INFO] [1677695839.529366444] [/move_group]: Received event 'stop'

The controller starts working normally again after switching the controllers again. To do this we:

Call the controller_manager/switch_controller service to switch to our custom controller. We do this because we can’t make the position_joint_trajectory_controller run since it is “still running”.
Call the controller_manager/switch_controller service to switch to the position_joint_trajectory_controller

The problem that we would like to be addressed is not the failure of the controller itself, but the fact that it is not reported as not working since we can’t identify why the controller failed or do something about it automatically if it is not reported.

Steps to reproduce

We don’t know the exact cause of the controller to fail when it happens to us, so to reproduce the error, I just made a script that moves a little bit back with the position_joint_trajectory_controller, then moves forward a little bit with our custom controller and repeats this 200 times. I made it very rough to provoke the controller failure more easily and I tested it twice and in both cases the error happened after not too long.

@Maverobot, we would prefer to send the script and custom controller privately. Could you tell me how should I share it with you? I can make a workspace with the script in one package and the controller in another one. My mail is h.estrada@organifarms.de

The only issue is that we also use a custom planner that uses some packages with some modifications and it could be complicated to share. The robot should only plan and move to a specific and given joint configuration. I will leave an empty method in the script move_to_this_position that should do this. Would it be possible for you to plan and move it with the default MoveIt tools or however is easier for you? Thank you in advance for your time.

About this issue

Original URL
State: open
Created a year ago
Comments: 20 (8 by maintainers)

Most upvoted comments

This issue is no longer a blocker for us. We will use the follow_joint_trajectory action and a sleep after changing the controller as recommended to solve it.

The remaining thing to do would be to evaluate if the reset done by franka_control_node is necessary. If not, taking it out would avoid race conditions and we could take the sleep away, but this is not a blocker and would not result in a fatal error for us anymore.

HumbertoE on Mar 24, 2023

After getting stuck, I killed the controller switch test script and ran it again. No error was shown and the motions were executed successfully (until the next “stuck”).

This is weird because for us it stop working. I will do the experiment again using your script and sending the trajectories directly to the position_joint_trajectory_controller as you did.

It would be very helpful if you can come up with code, with which we can reproduce it.

If the trajectories are still failing for us after using your script, I will see if I can catch the error where I want the controller switch test to stop to make sure it stops for the reason we want and send it to you. As I said now I can’t run tests with the robot, but I will as soon as possible and write you back. Maybe tomorrow evening I can already test and send you my feedback. The latest, I will send it on Wednesday.

HumbertoE on Mar 20, 2023