
MP-SfM redefines classical Structure-from-Motion by tightly integrating monocular depth and surface normal priors into incremental SfM, enabling robust 3D reconstruction from sparse, unstructured image collections.
Key Highlights:
- Monocular Depth + Surface Normal Fusion – Augments traditional SfM with priors from off-the-shelf deep networks (e.g., Metric3D-v2, DSINE), eliminating the need for three-view overlap.
- Two-View Track Reconstruction – Enables multi-view 3D estimation from as few as two views by lifting single-view features to 3D, overcoming the three-view constraint of conventional pipelines like COLMAP.
- Depth-Constrained Bundle Adjustment – Jointly optimizes poses, 3D points, and depth maps with robust loss functions and normal-based integration, improving accuracy even under noisy priors.
- Principled Uncertainty Propagation – Incorporates predictive uncertainty from monocular models, allowing dynamic weighting in optimization and future-proofing against model improvements.
- Depth Consistency Check – Rejects symmetry-induced misregistrations via dense geometric verification, reasoning over occlusion and free space across views.
- High Robustness in Adverse Scenarios – Outperforms COLMAP, GLOMAP, StudioSfM, and MASt3R-SfM on ETH3D, SMERF, RealEstate10k, and Tanks & Temples, particularly in low-parallax and low-overlap conditions.
- Dense and Sparse Feature Agnostic – Works with both sparse SuperPoint+LightGlue and dense RoMa/MASt3R tracks, demonstrating adaptability across matching paradigms.
- Efficient Alternating Optimization – Avoids Schur complement by splitting joint refinement into image-wise and structure-wise blocks, balancing accuracy and scalability.
- Ablated and Benchmarked – Extensively validated through 50+ controlled experiments across triplet, minimal-overlap, and full-scene reconstructions; ablation studies confirm each module’s contribution.
- Open-Source & Modular – Code available at github.com/cvg/mpsfm, designed to plug into existing SfM workflows with minimal tuning.
Resources
Paper: https://arxiv.org/abs/2504.20040
Github: https://github.com/cvg/mpsfm
Related Articles from LearnOpenCV:
- VGGT: Visual Geometry Grounded Transformer: https://learnopencv.com/vggt-visual-geometry-grounded-transformer-3d-reconstruction/
- Understanding Iterative Closest Point (ICP): https://learnopencv.com/iterative-closest-point-icp-explained/
- MASt3R and MASt3R-SfM Explanation: https://learnopencv.com/mast3r-sfm-grounding-image-matching-3d/