
NormalCrafter introduces a novel approach for surface normal estimation in videos, leveraging diffusion priors to achieve high spatial fidelity and temporal consistency over arbitrary-length sequences.
Key Highlights:
- Video Diffusion Model Repurposing – Adapts Stable Video Diffusion (SVD) for normal map prediction, maintaining temporal structure instead of RGB generation.
- Semantic Feature Regularization (SFR) – Aligns intermediate diffusion features with DINO semantic embeddings, enhancing fine-grained geometric detail without inference overhead.
- Two-Stage Training Protocol – Trains full U-Net in latent space for long-term temporal modeling, followed by spatial fine-tuning in pixel space for high-resolution normal accuracy.
- Fine-Tuned VAE Decoder – Improves normal map reconstruction quality by adapting the VAE decoder, reducing angular errors and boosting PSNR during training.
- Zero-Shot Generalization – Achieves strong results across NYUv2, iBims-1 (static images), and ScanNet, Sintel (videos) without task-specific fine-tuning.
- Superior Quantitative Results – Outperforms baselines (DSINE, StableNormal, Marigold-E2E-FT) with up to 1.6° lower mean angular error and +3.1% better pixel accuracy under 30° error on Sintel videos.
- Temporal Stability – Produces smoother y-t slices compared to prior methods, eliminating flickering artifacts under large motion and dynamic scenes.
- Efficient Semantic Enhancement – SFR operates only during training, adding no inference latency or memory cost.
- Flexible Single-Image Compatibility – Capable of single-frame normal estimation by setting frame length to one, maintaining competitive static accuracy.
- Extensive Validation – Evaluated across DAVIS, Sora-generated videos, NYUv2, iBims-1, ScanNet, Sintel benchmarks, confirming robustness to diverse environments.
Project
- Project Page: https://normalcrafter.github.io/
- Paper: https://arxiv.org/abs/2504.11427
- Github: https://github.com/Binyr/NormalCrafter
Related articles from LearnOpenCV
- DepthPro – Monocular Metric Depth Estimation: https://learnopencv.com/depth-pro-monocular-metric-depth/
- Sapiens: Foundation for Human Vision Models: https://learnopencv.com/sapiens-human-vision-models/
- Depth Estimation: https://learnopencv.com/author/kaustubh-sadekar/
- Research Papers: https://opencv.org/blog/category/research-papers/