2024 : 11 : 21
Fateme Daneshfar

Fateme Daneshfar

Academic rank: Assistant Professor
ORCID:
Education: PhD.
ScopusId: 35078447100
HIndex:
Faculty: Faculty of Engineering
Address: Department of Computer Engineering, Faculty of Engineering, University of Kurdistan
Phone:

Research

Title
Image captioning by diffusion models: A survey
Type
JournalPaper
Keywords
Image captioning, Diffusion models, Image-to-text, Survey, Implemented artificial intelligence, Application of artificial intelligence,
Year
2024
Journal Engineering Applications of Artificial Intelligence
DOI
Researchers Fateme Daneshfar ، Ako Bartani ، Pardis Lotfi

Abstract

Diffusion models are increasingly favored over traditional approaches like generative adversarial networks (GANs) and auto-regressive transformers due to their remarkable generative capabilities. They demonstrate outstanding performance not solely limited to image generation and manipulation but also in text-related tasks. Despite this, existing surveys tend to concentrate on the utilization of diffusion models solely for image generation, ignoring their potential in image captioning. To address this oversight, our paper provides an exhaustive examination of image-to-text diffusion models within the landscape of artificial intelligence (AI) and generative computing, filling a critical void in the literature. Starting with an overview of basic diffusion model principles, we explore into the enhancements brought by conditioning or guidance and the implemented AI. We then present a taxonomy and review of cutting-edge methods in diffusion-based image captioning. Additionally, we explore applications beyond image-to-text generation, such as image-guided creative generation, text editing, and the application of AI. We also cover existing evaluation metrics, software and libraries, as well as challenges and future directions in the field.