opencv: Improper or bad rotation estimation with solvePnP in some cases
System information (version)
- OpenCV => master (ee257ff)
- Operating System / Platform => Ubuntu 16.04 / 64 bits
- Compiler => gcc
Detailed description
In some cases, the pose estimated with solvePnP()
give a rotation matrix that looks correct (det(R)=1
, R.R^t=Identity
) but will project the z-axis of the object frame in the opposite direction.
More information here: ArUco marker z-axis mirrored.
Example image from the linked question:
My example (less obvious, with aruco::CORNER_REFINE_NONE
):
The issue can appear on one frame and disappear in the next frame.
Steps to reproduce
To get a complete reproducible example with Aruco detection see here.
A minimal reproducible code with some optional debug code/display
#include "opencv2/opencv.hpp"
#include "opencv2/aruco.hpp"
int main(int argc, char *argv[])
{
cv::Mat frame = cv::imread(argc > 1 ? argv[1] : "img_fail.png");
// Intrinsic camera parameters.
cv::Mat camMatrix = cv::Mat::eye(3, 3, CV_64F);
cv::Mat distortionCoeffs = cv::Mat::zeros(8, 1, CV_64F);
camMatrix.at<double>(0, 0) = 2.3396381685789738e+03;
camMatrix.at<double>(0, 2) = 960.;
camMatrix.at<double>(1, 1) = 2.3396381685789738e+03;
camMatrix.at<double>(1, 2) = 540.;
std::cout << "camMatrix:\n" << camMatrix << std::endl;
// distortionCoeffs.at<double>(0, 0) = -1.0982746232841779e-01;
// distortionCoeffs.at<double>(1, 0) = 2.2689585715220828e-01;
// distortionCoeffs.at<double>(2, 0) = 0.;
// distortionCoeffs.at<double>(3, 0) = 0.;
// distortionCoeffs.at<double>(4, 0) = -2.2112148171171589e-01;
std::cout << "distortionCoeffs: " << distortionCoeffs.t() << std::endl;
std::vector<cv::Point3d> objectPoints;
objectPoints.push_back( cv::Point3d(-2.5, 2.5, 0.0) );
objectPoints.push_back( cv::Point3d(2.5, 2.5, 0.0) );
objectPoints.push_back( cv::Point3d(2.5, -2.5, 0.0) );
objectPoints.push_back( cv::Point3d(-2.5, -2.5, 0.0) );
std::vector<cv::Point2d> imagePoints;
imagePoints.push_back( cv::Point2d(988, 512) );
imagePoints.push_back( cv::Point2d(945, 575) );
imagePoints.push_back( cv::Point2d(849, 544) );
imagePoints.push_back( cv::Point2d(893, 480) );
std::vector<int> solvePnP_methods;
solvePnP_methods.push_back(cv::SOLVEPNP_ITERATIVE);
solvePnP_methods.push_back(cv::SOLVEPNP_EPNP);
solvePnP_methods.push_back(cv::SOLVEPNP_P3P);
solvePnP_methods.push_back(cv::SOLVEPNP_AP3P);
std::map<int, std::string> solvePnP_method_names;
solvePnP_method_names[cv::SOLVEPNP_ITERATIVE] = "cv::SOLVEPNP_ITERATIVE";
solvePnP_method_names[cv::SOLVEPNP_EPNP] = "cv::SOLVEPNP_EPNP";
solvePnP_method_names[cv::SOLVEPNP_P3P] = "cv::SOLVEPNP_P3P";
solvePnP_method_names[cv::SOLVEPNP_AP3P] = "cv::SOLVEPNP_AP3P";
for (size_t idx = 0; idx < solvePnP_methods.size(); idx++)
{
cv::Mat debugFrame = frame.clone();
cv::putText(debugFrame, solvePnP_method_names[solvePnP_methods[idx]], cv::Point(20,20), cv::FONT_HERSHEY_SIMPLEX , 0.5, cv::Scalar(0,0,255));
cv::line(debugFrame, imagePoints[0], imagePoints[1], cv::Scalar(0,0,255), 2);
cv::line(debugFrame, imagePoints[1], imagePoints[2], cv::Scalar(0,255,0), 2);
cv::line(debugFrame, imagePoints[2], imagePoints[3], cv::Scalar(255,0,0), 2);
cv::Mat rvec, tvec;
cv::solvePnP(objectPoints, imagePoints, camMatrix, distortionCoeffs, rvec, tvec, false, solvePnP_methods[idx]);
cv::Mat rotation_matrix;
cv::Rodrigues(rvec, rotation_matrix);
std::cout << "\nrotation_matrix:\n" << rotation_matrix << std::endl;
std::cout << "R.R^t=" << rotation_matrix*rotation_matrix.t() << std::endl;
std::cout << "det(R)=" << cv::determinant(rotation_matrix) << std::endl << std::endl;
cv::aruco::drawAxis(debugFrame, camMatrix, distortionCoeffs, rvec, tvec, 5.0);
std::vector<cv::Point3d> axis_vec;
axis_vec.push_back( cv::Point3d(0, 0, 0) );
axis_vec.push_back( cv::Point3d(1, 0, 0) );
axis_vec.push_back( cv::Point3d(0, 1, 0) );
axis_vec.push_back( cv::Point3d(0, 0, 1) );
std::vector<cv::Point2d> axis_vec_proj;
cv::projectPoints(axis_vec, rvec, tvec, camMatrix, distortionCoeffs, axis_vec_proj);
for (size_t i = 0; i < axis_vec_proj.size(); i++)
{
std::cout << "axis_vec_proj[" << i << "]=" << axis_vec_proj[i].x << " ; " << axis_vec_proj[i].y << std::endl;
}
// Project cube.
float length = 20.0f;
std::vector<cv::Point3f> testObj3d;
testObj3d.push_back(cv::Point3f(0, 0, 0));
testObj3d.push_back(cv::Point3f(length, 0, 0));
testObj3d.push_back(cv::Point3f(0, length, 0));
testObj3d.push_back(cv::Point3f(length, length, 0));
testObj3d.push_back(cv::Point3f(0, 0, length));
testObj3d.push_back(cv::Point3f(length, 0, length));
testObj3d.push_back(cv::Point3f(0, length, length));
testObj3d.push_back(cv::Point3f(length, length, length));
std::vector<cv::Point2f> testObj2d;
cv::projectPoints(testObj3d, rvec, tvec, camMatrix, distortionCoeffs, testObj2d);
cv::line(debugFrame, testObj2d[0], testObj2d[1], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[0], testObj2d[2], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[1], testObj2d[3], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[2], testObj2d[3], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[0], testObj2d[4], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[1], testObj2d[5], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[2], testObj2d[6], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[3], testObj2d[7], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[4], testObj2d[5], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[4], testObj2d[6], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[5], testObj2d[7], cv::Scalar(0,0,0), 1);
cv::line(debugFrame, testObj2d[6], testObj2d[7], cv::Scalar(0,0,0), 1);
cv::imshow("debug", debugFrame);
cv::waitKey(0);
}
}
The image from which the corners have been extracted here.
Note:
cv::aruco::drawAxis()
should be ok as with manual matrix multiplications lead to same result- I don’t think the issue comes from
cv::Rodrigues()
as I get more or less (I suppose the correct rotation matrix should have a different sign somewhere, which is not the case?) the same rotation matrix in solvePnP function (just before the call tocv::Rodrigues()
, as solvePnP returns a rvec and do internally the conversion rotation matrix -> rotation vector) cv::SOLVEPNP_ITERATIVE
,cv::SOLVEPNP_EPNP
,cv::SOLVEPNP_P3P
andcv::SOLVEPNP_AP3P
all seem to fail, the estimated rotation is not the same for some methods- using
cv::aruco::CORNER_REFINE_SUBPIX
can solve the issue in the original question (in my case, as the markers are much smaller the refinement is not always correct and the issue remains) - while I can understand that some numerical imprecision in the corner locations can lead to a pose badly estimated, I cannot understand why in this case the translation is ok and the rotation is almost ok except a flipped z-axis (or why a left-handed rotation matrix seems to be returned)
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 12
- Comments: 27 (12 by maintainers)
~Cheap workaround: if you assume the marker always faces up relative to the camera, then a bad rotation can be detected if it violates this constraint. The pose’s rotation can be “fixed” by flipping the axes and doing a “fixup” rotation:~ Edit: this works only in special cases and shouldn’t be used!
BTW, I don’t think filters on their own are enough to fix this (if it happens a lot). Bad pose solutions basically make the distribution multi-modal, so an (E)KF wouldn’t work well. You could use particle filters, cluster the particles (rather than averaging) to find modes, and pick the most likely mode.
@r2d3 Indeed, there could be multiple solutions given the measure error.
My confusion was about the z-axis “being flipped” and thus meaning that the rotation matrix estimated would be left-handed. The determinant of the rotation matrix was equal to
1
adding more confusion to me as I was convinced to see a left-handed frame.Actually, it was just an optical illusion similar to this:
Correct estimation of the pose (
SOLVEPNP_P3P
):Bad estimation of the pose (rotation) (
SOLVEPNP_ITERATIVE
), the z-axis is pointing toward us and not going inside the table (the x-axis is going more or less inside the table)This is more obvious by looking at the complete cube instead of just the marker frame.
To recap, the real issue (bad pose (rotation) estimated) comes from a limitation of the PnP algorithm used (subject to noises as intrinsics or 2D image coordinates) and possibly from the ambiguity of the projection of a planar object, see for instance:
D. Oberkampf & L. S. Davis Daniel DeMenthon. Iterative pose estimation using coplanar feature points.
:(SOP means Scaled Orthographic Projection.)
Many PnP algorithms exist meaning that the pose estimation problem is not trivial.
The pose ambiguity problem is a known issue for PnP when the model points are co-planar. It occurs when the planar model is either small, or viewed far to the camera.
thanks to @catree who integrated IPPE in #14362, it is now possible to get the two pose solutions (rotation and translation).
@hellochenwang
Yes, from the two poses, if the reprojection error of both poses is similar, then there is an ambiguity. A feature of #14362 is to give the reprojection error of both poses, so this can be measured.
So how to resolve the ambiguity? You can’t do it using pnp, and indeed no pnp algorithm will be able to resolve it, because it is a property of the problem. More information is required to constrain the marker’s pose.
There are four main options:
To use the effect of shading to determine the correct pose. Basically, the brightness of the marker is influenced by its orientation relative to the scene’s illumination. It is possible to determine the correct pose using this information. This is nicely shown in the optical illusion posted above by @catree. However this can be difficult in practice because it requires determining the illumination of the scene.
To use temporal filtering, as suggested by @catree. But in practice this is very hard to use, and it cannot solve the problem reliably. It might reduce random flipping between the two solutions, but we cannot guarantee that the smoothed trajectory of poses is correct.
To use additional non-coplanar markers. If you have one or more additional markers that are not co-planar to the original marker, then you can resolve the ambiguity. This is because the pnp problem no longer involves a planar model, and we can compute pose uniquely.
To use additional coplanar markers. If you have a set of markers that all lie on the same plane, then is it possible to resolve the ambiguity, but there is a condition. The set of markers must span a region of space that is ‘sufficiently large’. You need this condition to prevent an ambiguity of the entire set of markers being flipped. There are two ways to solve this in practice. The first is to solve the pose of each marker independently and then find the correct pose using a consensus. The correct pose should be revealed as a tight cluster within the set of all poses (2 poses per marker). If you don’t know the relative positions of the markers on the plane, you can do a consensus with the estimated normals of each marker (with two normals per marker). Alternatively if you know the relative positions of the markers on the plane, things are simpler. You can simply detect all the markers in the image, then solve a single pnp problem with 4n point correspondences, where n is the number of detected markers.
Another reference on the same topic.
From the ArUco documentation:
@hellochenwang
Indirectly I think so. Instead of just detecting the marker(s), track the marker(s) and save the corresponding poses. Since the issue should come from an inaccuracy about the corner locations (+ uncertainties about camera intrinsic parameters and distortion) and should be spurious, it should be possible to detect when the orientation suddenly changes from one image frame to another image frame. Unfortunately, you will not be able to get easily the other solution (planar pose estimation can give two ambiguous poses and one pose is eliminated I think using the one that gives the minimal projection error).
This should be similar to what proposes the ArUco library:
Another solution could be to filter the poses but this is not a trivial problem:
2π
issuesSide note.
After a quick look at the ArUco library, it implements pose from homography / “
solvePoseOfCentredSquare()
” and some other functions that could be interesting to port into the OpenCV ArUco module. That makes me think also that we should have a genericsolvePnP()
function that returns all the possible solution to the pose estimation problem (for coplanar poses, two solutions, for P3P, four solutions).See also:
There is a new version of Apriltag3 released.
They use pose from homography with a method to get the two solutions (estimate_tag_pose_orthogonal_iteration) of the planar pose estimation problem.
What could be done to avoid the frame axis flipping is to use the temporal pose consistency. Here a simplistic demo where I use the pose (from the two solutions) whose the rotation part is the closest from the last pose. Reprojection error can not be used since with calibration errors and corners location errors, there is not guarantee that the correct solution is the one with the lowest reprojection error.
Video
Since the tag is not flat, the pose ambiguity is more prominent. There is still an issue with maybe a singularity with the implemented method where we can observe the frames of all tags changing brusquely of orientation. Need more tests and maybe test also IPPE planar pose estimation method.
There is possibly another solution. See this paper, code is here:
This can be a viable solution since low-cost stereo-cameras are available on the market (e.g. Intel RealSense, Structure sensors, …).
Some notes from my understanding of the paper.
This figure summarizes the planar pose ambiguity problem for the fiducial marker pose estimation:
So the idea is to fuse pose estimated using 3D and 2D information:
And a figure that recaps the whole method:
Some notes from my experiments to implement this method.
Prerequisite: be able to acquire RGB images and depth map and to align the depth map with the RGB image (same
[u,v]
coordinates in the RGB and depth images represent the same physical 3D point)Detect the fiducial markers in the images to get the corners 2D locations.
Estimate the plane equation for each marker using 3D data:
Compute the 3D coordinates for each marker corners in the camera frame:
[u,v]
coordinates, transform to the normalized camera frame, seeundistortPoints()
function. Without distortion coefficients, it is simply:n
and a pointx0
on the plane:and
solvePnPRefineLM()
orsolvePnPRefineVVS()
functions (the authors used a constrained optimization function)So, the key idea is to exploit the 3D information to fit a plane and try to avoid the planar ambiguity problem by constraining the 3D coordinates to be on the plane. The pose is then refined using non-linear optimization from 3D-2D correspondences.
Some issues I got:
Z
from the plane equation, ensure thatZ > 0
(just remove the sign).det(R) = -1
. In this case normalize R, see this code:Some video results I got using an Intel RealSense D435:
I realize that it is natural and not a bug! In fact Z axis always point toward us and not flipped (You can check the axes generated). What happened is shown in picture bellow because of low resolution or far distance. In fact you see the XY plane the same and think that Z axis flipped in this scenario.
Some recent fiducial marker methods:
TopoTag: A Robust and Scalable Topological Fiducial Marker System, Guoxing YuYongtao HuJingwen Dai (2019)
STag: A Stable Fiducial Marker System, Burak Benligiray, Cihan Topal, Cuneyt Akinlar (2017)
Detection and Accurate Localization of Circular Fiducials under Highly Challenging Conditions, Lilian Calvet, Pierre Gurdjos, Carsten Griwodz, Simone Gasparini (2016) (CCTag)
librpp for robust planar pose estimation used in ARToolkitPlus. ARToolkitPlus has been proposed in 2007.
@antithing I don’t think so. Anyway, it was something really simple:
Maybe to avoid sporadic flipping, you can try to use a buffer of
N
past observations and use a voting scheme / median scheme.To compute distance between two rotations, have a look at the literature:
Something like:
Rdiff = Rprev . Rcur^T
theta
rotation from thetheta.u
vectorIt should be equivalent to:
You’re absolutely right. The ambiguity can also be resolved if you have a stereo rig or an rgbd camera. This can work when the marker is in the working range of the stereo rig or rgbd camera. But if the marker is far away, it can still be impossible to resolve it.
My response was assuming you only had a monocular (single lens) camera.
Hello @catree
As your 4 points are on a plane and on a square, there seems to be multiple solution given the measure error of the corners.
I think adding one or more points inside the marker will help the solution to be more stable. Adding more points will help as you cannot use
cv::aruco::CORNER_REFINE_SUBPIX
.If you have multiple markers, you could use
cv::aruco::estimatePoseSingleMarkers
that will solve this instability in the solutionRegards
David
@antithing
(Note: @paland3 pointed out this code doesn’t work when the center of the marker isn’t at the origin. Haven’t found a fix. Further, I found cases where the 0 < R[1,1] < 1 condition doesn’t work, and haven’t found a fix for that either. Basically the “workaround” I posted is crap.)
0 < R.at<double>(1, 1) < 1
doesn’t do the same thing as in Python; you need0 < ... && ... < 1
forward
needs to be(0, 1, 0)
,R.at(1,1)
should beR.at(2,2)
(and maybe negative??), andFlipAxes
will need to change… somehow, I can’t juggle these coordinates in my head 😃tnorm
should beT / sqrt(...)
tnorm.cross(forward)
@valtron Thanks for the workaround. As far as I can tell, the “flipping” also messes up the translation (origin of the coordinate system is predicted closer or further because of the rotation). Can you explain how your code address that?
Also, what does the angle * axis part produces?
EDIT: I implemented the workaround in Matlab, with some comments, might help someone.
As it only rotates the coord system, it won’t fix the translation error. Does anyone has a solution for that? Thanks again @valtron, you saved me a lot of time!