Diffusion models are increasingly favored over traditional approaches like generative adversarial networks (GANs) and auto-regressive transformers due to their remarkable generative capabilities. They demonstrate outstanding performance not solely limited to image generation and manipulation but also in text-related tasks. Despite this, existing surveys tend to concentrate on the utilization of diffusion models solely for image generation, ignoring their potential in image captioning. To address this oversight, our paper provides an exhaustive examination of image-to-text diffusion models within the landscape of artificial intelligence (AI) and generative computing, filling a critical void in the literature. Starting with an overview of basic diffusion model principles, we explore into the enhancements brought by conditioning or guidance and the implemented AI. We then present a taxonomy and review of cutting-edge methods in diffusion-based image captioning. Additionally, we explore applications beyond image-to-text generation, such as image-guided creative generation, text editing, and the application of AI. We also cover existing evaluation metrics, software and libraries, as well as challenges and future directions in the field.