Test3R: Learning to Reconstruct 3D at Test Time

Test3R is a novel and simple test-time learning technique that significantly improves 3D reconstruction quality. Unlike traditional pairwise methods such as DUSt3R, which often suffer from geometric inconsistencies and poor generalization, Test3R leverages image triplets and self-supervised optimization at inference to enforce cross-pair consistency. This makes it both robust and cost-efficient, requiring minimal overhead while delivering state-of-the-art performance.

Key Highlights:

Cross-Pair Consistency: Optimizes at test time using triplets of images, ensuring geometric predictions align across pairs and reducing reconstruction errors.
Prompt-Tuning Based Adaptation: Efficiently adapts models with visual prompts at test time, keeping the backbone frozen and avoiding heavy retraining.
Superior Reconstruction: Outperforms DUSt3R and even strong baselines like CUT3R and MAST3R across 3D reconstruction benchmarks like 7Scenes, NRGBD.
Depth Estimation Gains: Achieves state-of-the-art results on multi-view depth tasks like DTU, ETH3D as well as surpassing methods that rely on camera poses or domain-specific training.
Universally Applicable: Easily integrates into existing models such as MAST3R and MonST3R, boosting performance with minimal test-time overhead.

Why It Matters

Test3R introduces a lightweight yet powerful way to adapt models at inference, addressing the long-standing limitations of pairwise 3D reconstruction. By maximizing cross-view consistency, it enables accurate geometry recovery even in challenging, unseen environments, paving the way for scalable, reliable, and efficient 3D perception.