Document Type

Article

Publication Date

3-19-2026

Publication Title

Electronics

Abstract

Diffusion models have emerged as a powerful class of generative models, demonstrating impressive results across visual domains such as image and video synthesis. This survey provides a comprehensive taxonomy of generative models, with a particular focus on diffusion models and their applications in enhancing visual fidelity for text-to-image and text-to-video generation. We discuss the theoretical foundations of diffusion models, including their formulation through stochastic differential equations, and analyze the forward noising and reverse denoising processes that enable stable training and high-quality generation. The survey further categorizes diffusion architectures, including pixel-space and latent-space models, and examines their design choices, training strategies, and trade-offs across different resolution regimes. In addition, we review noise characteristics in real-world imaging domains and discuss their implications for diffusion-based models. Denoising strategies are analyzed by distinguishing between in-model denoising mechanisms and external denoising techniques used in preprocessing and post-processing pipelines. The survey also summarizes commonly used datasets and evaluation metrics for generative modeling, providing a practical perspective on benchmarking and model comparison. Finally, we discuss current challenges, including computational efficiency, scalability, and robustness to diverse noise distributions, and outline potential directions for future research. This survey aims to provide a structured reference for understanding diffusion models and their applications in visual generation tasks.

DOI

10.3390/electronics15061293

Version

Publisher's PDF

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Volume

15

Issue

6

Share

COinS