LongSplat: Robust Unposed 3D Gaussian Splatting

LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos (NYCU & NVIDIA Research)

LongSplat is a new framework that achieves high-quality novel view synthesis from casually captured long videos, without requiring camera poses. It overcomes challenges like irregular motion, pose drift, and memory limits to deliver state-of-the-art 3D reconstructions.

Key Highlights:

Joint Optimization: Simultaneously refines camera poses and 3D Gaussians, ensuring globally consistent reconstructions.
Robust Pose Estimation: Leverages learned 3D priors for accurate camera tracking even under complex trajectories.
Octree Anchor Formation: A density-driven adaptive strategy that reduces memory usage while preserving fine scene details.
Superior Performance: Outperforms COLMAP, LocalRF, CF-3DGS, and others on Free, Hike, and Tanks & Temples datasets, avoiding pose drift and OOM failures.
Efficiency at Scale: Achieves real-time training speed (281 FPS) with a compact 101 MB model size on RTX 4090.

Why It Matters:
Casually recorded videos from phones and action cameras are everywhere, but extracting reliable 3D scenes is extremely difficult. LongSplat shows that 3D Gaussian Splatting can be made robust and memory-efficient for long, unconstrained videos, paving the way for VR/AR, digital tourism, video editing, and navigation applications.