Image captioning by diffusion models: A survey, By Fatemeh Daneshfar

Research

Title	Image captioning by diffusion models: A survey
Type	JournalPaper
Keywords	Image captioning, Diffusion models, Image-to-text, Survey, Implemented artificial intelligence, Application of artificial intelligence,
Year	2024
Journal	Engineering Applications of Artificial Intelligence
DOI
Researchers	Fatemeh Daneshfar ، Ako Bartani ، Pardis Lotfi

Abstract

Diffusion models are increasingly favored over traditional approaches like generative adversarial networks (GANs) and auto-regressive transformers due to their remarkable generative capabilities. They demonstrate outstanding performance not solely limited to image generation and manipulation but also in text-related tasks. Despite this, existing surveys tend to concentrate on the utilization of diffusion models solely for image generation, ignoring their potential in image captioning. To address this oversight, our paper provides an exhaustive examination of image-to-text diffusion models within the landscape of artificial intelligence (AI) and generative computing, filling a critical void in the literature. Starting with an overview of basic diffusion model principles, we explore into the enhancements brought by conditioning or guidance and the implemented AI. We then present a taxonomy and review of cutting-edge methods in diffusion-based image captioning. Additionally, we explore applications beyond image-to-text generation, such as image-guided creative generation, text editing, and the application of AI. We also cover existing evaluation metrics, software and libraries, as well as challenges and future directions in the field.