sandeep
BlenderFusion is a novel framework that merges 3D graphics editing with diffusion models to enable precise, 3D-aware visual compositing. Unlike prior approaches that struggle with multi-object and camera disentanglement, BlenderFusion
Ever wondered how those slick background removal tools actually work? You upload a photo, click a button, and boom, the subject pops while the clutter disappears. But behind that magic
The Google DeepMind team has unveiled its latest evolution in their family of open models – Gemma 3, and it’s a monumental leap forward. While the AI space is crowded
LongSplat is a new framework that achieves high-quality novel view synthesis from casually captured long videos, without requiring camera poses. It overcomes challenges like irregular motion, pose drift, and memory
DINOv3 is a next-generation vision foundation model trained purely with self-supervised learning. It introduces innovations that allow robust dense feature learning at scale with models reaching 7B parameters and achieves
Genie 3 is a general-purpose world model which, given just a text prompt, generates dynamic, interactive environments in real time and rendered at 720p, 24 fps, while maintaining consistency over
In the complex world of modern medicine, two forms of data reign supreme: the visual and the textual. On one side, a deluge of images, X-rays, MRIs, and pathology slides.
In the fast-paced world of artificial intelligence, a new model is making waves for its innovative approach and impressive performance: MOLMO (Multimodal Open Language Model), developed by the Allen Institute
PartCrafter is the first unified 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D parts from a single RGB image without any segmentation required. Powered by
Hello, Let me show you an image, can you describe what you see? Perfect! You nailed it: a bird sitting peacefully on a railing. Now, let’s flip it. I’ll describe
SimLingo unifies autonomous driving, vision-language understanding, and action reasoning-all from camera input only. It introduces Action Dreaming to test how well models follow instructions, and outperforms all prior methods on
SAM4D introduces a 4D foundation model for promptable segmentation across camera and LiDAR streams, addressing the limitations of frame-centric and modality-isolated approaches in autonomous driving. Key Highlights: Paper Resources