Research Papers

DINOv3 is a next-generation vision foundation model trained purely with self-supervised learning. It introduces innovations that allow robust dense feature learning at scale with models reaching 7B parameters and achieves

Genie 3 is a general-purpose world model which, given just a text prompt, generates dynamic, interactive environments in real time and rendered at 720p, 24 fps, while maintaining consistency over

PartCrafter is the first unified 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D parts from a single RGB image without any segmentation required. Powered by

SimLingo unifies autonomous driving, vision-language understanding, and action reasoning-all from camera input only. It introduces Action Dreaming to test how well models follow instructions, and outperforms all prior methods on

SAM4D introduces a 4D foundation model for promptable segmentation across camera and LiDAR streams, addressing the limitations of frame-centric and modality-isolated approaches in autonomous driving. Key Highlights: Paper Resources

VideoGameBench is a rigorous benchmark that evaluates VLMs’ real-time decision-making, perception, memory, and planning by challenging them to complete 1990s-era video games with only raw visual inputs and minimal control

LeGO-LOAM introduces a cutting-edge lidar odometry and mapping framework designed to deliver real-time, accurate 6-DOF pose estimation for ground vehicles, optimized for challenging, variable terrain environments. It significantly reduces computational

Reliable-loc introduces a resilient LiDAR-based global localization system for wearable mapping devices in complex, GNSS-denied street environments with sparse features and incomplete prior maps. Key Highlights: Paper Resources Related articles

This is the world’s first SLAM dataset recorded onboard real roller coasters, offering extreme motion dynamics, perceptual challenges, and unique conditions for benchmarking SLAM algorithms under aggressive real-world trajectories. Key

This paper introduces a SLAM framework that achieves real-time CPU-only performance in dense, registration-error-minimization-based odometry and mapping by leveraging exact point cloud downsampling via coreset extraction, eliminating the need for

MP-SfM redefines classical Structure-from-Motion by tightly integrating monocular depth and surface normal priors into incremental SfM, enabling robust 3D reconstruction from sparse, unstructured image collections. Key Highlights: Resources Paper: https://arxiv.org/abs/2504.20040Github:

NormalCrafter introduces a novel approach for surface normal estimation in videos, leveraging diffusion priors to achieve high spatial fidelity and temporal consistency over arbitrary-length sequences. Key Highlights: Project Related articles