BlenderFusion

BlenderFusion is a novel framework that merges 3D graphics editing with diffusion models to enable precise, 3D-aware visual compositing. Unlike prior approaches that struggle with multi-object and camera disentanglement, BlenderFusion leverages Blender for fine-grained control and a diffusion-based compositor for realism, bringing unprecedented flexibility to scene editing and generative compositing.

Key Highlights:

3D-Grounded Control: Segments and lifts objects into editable 3D entities, enabling precise manipulation of objects, camera, and background.
Generative Compositor: Dual-stream diffusion model refines Blender renders into photorealistic outputs, correcting artifacts and enhancing realism.
Training Strategies: Introduces source masking and simulated object jittering to improve disentangled object-camera control.
Superior Editing: Outperforms baselines like 3DIT and Neural Assets across multi-object editing, novel object insertion, and complex compositing tasks.
Generalization: Demonstrates strong results on datasets (MOVi-E, Objectron, Waymo) and unseen real-world scenes, handling diverse edits such as attribute changes, deformations, and background replacement.

Why It Matters:

BlenderFusion bridges the gap between graphics-based precision and generative synthesis, giving creators, artists, and researchers the ability to craft complex, high-fidelity visual narratives. It represents a leap toward controllable, fine-grained visual generation in both synthetic and real-world settings.