openMVG: Using robust_essential_spherical is not behaving as expected

Firstly, thanks for your cracking work and open sourcing this repo - I feel like I am definitely standing on the shoulders of giants!

The problem case I have is so get the relative rotation between two cameras from two spherical panoramic images, the relative cartesian transformation is a bonus. An example of two images I want to extract the camera poses for are copied below for reference (Theta_R0010732_2048.jpg (L) and Theta_R0010733_2048.jpg ®).

Theta_R0010732_2048

and

Theta_R0010733_2048

The problems (and I appreciate that this is merely a sample code) that I wondered if you could advise me on are:

  1. First, and it is a big one - is my use case what this example solver is built for? I am concerned that this solver is merely for “left eye - right eye” “VR-type” solves… I don’t think this is the case as the SIFT solver is obviously generic.

  2. The test images SponzaLion000.jpg and SponzaLion001.jpg are not RGB images and seem to be a strange format - can I use RGB images like my examples above or do they need a pre-processing step? The test images look like Gaussian scale-space…

  3. Ordering - I have the two attached images solving when I use -a "Theta_R0010732_2048.jpg" -b "Theta_R0010733_2048.jpg" (positioning in a clockwise order) but if I switch the order the final stage of the processing

if (RelativePoseFromEssential(
	xL_spherical,
	xR_spherical,
	E, 
	vec_inliers,
	&relative_pose,
	&inliers_indexes,
	&inliers_X))
{
    ...
}

never manages to satisfy

// Test if the best solution is good by using the ratio of the two best solution score
std::sort(cheirality_accumulator.begin(), cheirality_accumulator.end());
const double ratio = cheirality_accumulator.rbegin()[1] / static_cast<double>(cheirality_accumulator.rbegin()[0]);
return (ratio < positive_depth_solution_ratio);

and no pose information is output. Why is the code not agnostic to image ordering?

I think that will do for now, thank you in advance, you time is most appreciated.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 22 (7 by maintainers)

Most upvoted comments

Possible tips to make a better job with OpenMVG tools (leverage the fact that you have upright cameras):

  • Use upright features
  • Use a upright relative pose solver
  • Use a a-priori threshold for outlier rejection

Just downloaded the image you shared and ran the default openMVG example:

./Linux-x86_64-RELEASE/openMVG_sample_multiview_robustEssential_spherical -a /home/pierre/Downloads/167415405-9d940270-caef-4bac-bf6f-8f6ac1f7d58c.jpg -b /home/pierre/Downloads/167415444-61b901cf-9fb1-4694-86d8-cf40f5b4f60b.jpg
Compute the relative pose between two spherical image.
Use an Acontrario robust estimation based on angular errors.
Left image SIFT count: 6464
Right image SIFT count: 9031
INFO: [robust_estimator_ACRansac.hpp:440]   nfa=-31.8113 inliers=222/223 precisionNormalized=1.81254 precision=1.34631 (iter=0 ,sample=181,31,202,186,1,216,204,54,)
INFO: [robust_estimator_ACRansac.hpp:440]   nfa=-37.5333 inliers=221/223 precisionNormalized=1.28317 precision=1.13277 (iter=3 ,sample=50,30,70,189,173,216,123,147,)
INFO: [robust_estimator_ACRansac.hpp:440]   nfa=-37.9449 inliers=223/223 precisionNormalized=1.55586 precision=1.24734 (iter=12 ,sample=51,47,32,0,150,161,178,65,)
INFO: [robust_estimator_ACRansac.hpp:440]   nfa=-38.8189 inliers=222/223 precisionNormalized=1.34062 precision=1.15785 (iter=33 ,sample=184,131,212,114,191,143,122,149,)

 Angular threshold found: 66.3401(Degree)

 #Putatives/#inliers : 223/222

INFO: [sfm_data_BA_ceres.cpp:496] 
Bundle Adjustment statistics (approximated RMSE):
 #views: 2
 #poses: 2
 #intrinsics: 1
 #tracks: 118
 #residuals: 472
 Initial RMSE: 25.7638
 Final RMSE: 15.1251
 Time (s): 0.0177093 
--
 Used motion prior: 0
Residual statistics (pixels):
	 min: 0.535163
	 mean: 34.7237
	 median: 11.8085
	 max: 658.835

As we see we don’t have a perfect outlier filtering, but some of the structure seems to be matched and identified spherical_inliers

Tweaking the code and setting a scricter threshold for ACRANSAC (const double precision = D2R(4.0);// std::numeric_limits<double>::infinity();) is doing a more conservative filtering

./Linux-x86_64-RELEASE/openMVG_sample_multiview_robustEssential_spherical -a /home/pierre/Downloads/167415405-9d940270-caef-4bac-bf6f-8f6ac1f7d58c.jpg -b /home/pierre/Downloads/167415444-61b901cf-9fb1-4694-86d8-cf40f5b4f60b.jpg
Compute the relative pose between two spherical image.
Use an Acontrario robust estimation based on angular errors.
Left image SIFT count: 6464
Right image SIFT count: 9031
INFO: [robust_estimator_ACRansac.hpp:440]   nfa=-6.3346 inliers=84/223 precisionNormalized=0.00367438 precision=0.0606166 (iter=15 ,sample=167,183,58,128,153,209,157,96,)
INFO: [robust_estimator_ACRansac.hpp:440]   nfa=-7.95209 inliers=94/223 precisionNormalized=0.00367438 precision=0.0606166 (iter=58 ,sample=159,45,65,177,181,87,51,40,)

 Angular threshold found: 3.47308(Degree)

 #Putatives/#inliers : 223/94

INFO: [sfm_data_BA_ceres.cpp:496] 
Bundle Adjustment statistics (approximated RMSE):
 #views: 2
 #poses: 2
 #intrinsics: 1
 #tracks: 71
 #residuals: 284
 Initial RMSE: 4.58919
 Final RMSE: 3.57462
 Time (s): 0.0588137 
--
 Used motion prior: 0
Residual statistics (pixels):
	 min: 0.145014
	 mean: 5.02883
	 median: 3.13558
	 max: 31.0622

spherical_scriter_threshold

and seems to give something plausible for the 3D relative pose and scene points spherical_scriter_threshold_3d

Hi @pmoulon, sorry to comeback to you on this again; I don’t have an issue per-se, so I do not want to open another thread, but would like some advice if you would be so kind…

I now have the solver above working very well for the current use case we have - getting relative pose from two panoramas. The poses returned are in some arbitrary RVU, which are crucially not the same access different solves. The use case is that I wish to plot the poses of several cameras as the photographer moves in space and with real world scale. How we are doing this currently for three images (0, 1 and 2) for example is

  • Solve for 0 and 1 to get relative pose in RVU
  • Use a ML model to extract real world depth from the “static” panorama (zero indexed or “left” pano)
  • Use a confidence method to infer the real world depth using the inlier feature points

We then do the same for panos 1 and 2 - we then have both poses in real world scale and can then cascade the transforms to give all three poses relative to each other (we can do this for N cameras).

This is working relatively well and is only restricted by the accuracy of the depth coming out of our ML model and the inferrance of the depth to the inliers. And now my questions…

I have noticed that Agisoft can solve the relative pose positions of multiple (more than two) cameras in one batch and that the results are in the same RVU so the scaling is consistent and the relative positions are consistent.

  1. Is there a way of forcing the solves for poses between panos 0 and 1 and then panos 1 and 2 to use the same RVU (I would think 100% not)?
  2. If the answer to the above is no, do you have a feel as to how difficult it would be to extend the code to allow for solving multiple relative poses (n panoramas at one time)?

If the answer to 2 is that it is not major work, the plan would be to solve for poses of multiple cameras in consistent RVU - but then apply a single real world scale conversion based on the best estimate from the ML to give a consistent scale across all images/poses.

I would be interested also to see if you are available for some paid consultancy work. (my email address is nfcamus<at>gmail.com if you have any interest in this).

Hi @pmoulon, I have, what I think you agree, is a fairly interesting problem. I have adjusted your example above, and have this working perfectly for what we need. The problem is that although the openMVG algos are generating the correct relative poses for the two cameras, the scaling is in some arbitrary Relative Value Units. As part of our efforts, we have a cool ML model that can convert spherical panos to depth maps and provide very accurate real world depth. So, the idea is to use the ML model and the depth maps it produces to establish the real world relative poses between two cameras - to do this I do the following:

  1. Extract the relative pose information including the inlier feature points.
  2. Run the ML to produce a depth map.
  3. Take a inlier feature point an transform back from spherical coordinates to pixel coordinates (u, v) to get the pixel coordinate corresponding to the feature point.
  4. Use the depth map out of the machine learning and the pixel coordinate, to get the real world depth of the feature point and calculate the “scale factor” for the solution out of openMVG.
  5. Apply the scale factor to the relative pose to get real world values.

The approach seems to work fine - HOWEVER, I think the inlier “feature points” from the

        if (bundleAdjuster.Adjust(
            scene,
            Optimize_Options(
                Intrinsic_Parameter_Type::NONE,
                Extrinsic_Parameter_Type::ADJUST_ALL,
                Structure_Parameter_Type::ADJUST_ALL)))
        {
            // Compute the re-projection error...
            progress("Bundle adjustment complete, computing re-projection error for scene...");

            std::vector<double> residuals;
            const Pose3 poseStatic = scene.poses[scene.views[ImageTarget::STATIC]->id_pose];
            const Pose3 poseDynamic = scene.poses[scene.views[ImageTarget::DYNAMIC]->id_pose];

            for (const auto& landmarkIt : scene.GetLandmarks())
            {
                const Landmark& landmark = landmarkIt.second;
                const Observations& obs = landmark.obs;

                Observations::const_iterator iterObsI = obs.find(scene.views[ImageTarget::STATIC]->id_view);
                Observations::const_iterator iterObsJ = obs.find(scene.views[ImageTarget::DYNAMIC]->id_view);

                const Observation & ob0 = iterObsI->second;
                const Observation & ob1 = iterObsJ->second;

                const Vec2 residualI = cameraStatic.residual(poseStatic(landmark.X), ob0.x);
                const Vec2 residualJ = cameraDynamic.residual(poseDynamic(landmark.X), ob1.x);

                residuals.emplace_back(residualI.norm());
                residuals.emplace_back(residualJ.norm());

                // Here I am adding the feature points to my solution object... 
                solverSolution.inlierFeaturePoints.emplace_back(landmark.X[0], landmark.X[1], landmark.X[2]);
            }
    ....

are not what I think they are. I had assumed that these feature points were actual features in the geometry, where as from some tests I have undertaken, they do not seem to be. Is there anything in any of the “sub-algos” that I can hook into to somehow get any estimation of a “physical” feature to use with my auto scaling technique (defined above)?

Again, thank you very much for your time, it is most appreciate.

*Ps. I have run tests and set the scale estimation algo to reverse engineer a depth map from the corresponding ply so I am confident (with additional unit test) that the inverse equirectangular projection I am doing is correct. *