Vision–Language Models: Applications in Image Retrieval, Fine-Grained Classification, and Parameter-Efficient Fine-Tuning (PEFT)

Research

Title	Vision–Language Models: Applications in Image Retrieval, Fine-Grained Classification, and Parameter-Efficient Fine-Tuning (PEFT)
Type	WorkShop
Keywords	Vision-Language Models-Image retrieval
Year	2025
Researchers	Fatemeh Daneshfar

Abstract

Vision–Language Models (VLMs) have emerged as a powerful paradigm that bridges computer vision and natural language processing, enabling machines to jointly understand images and text. This seminar explores three key application areas of VLMs: image retrieval, where multimodal embeddings allow for efficient cross-modal search; fine-grained classification, where VLMs capture subtle distinctions across categories by leveraging both visual and textual cues; and parameter-efficient fine-tuning (PEFT), which provides practical strategies for adapting large-scale models to specific tasks without the need for full retraining. By examining recent advances and case studies, we will highlight the strengths, limitations, and future directions of VLMs in real-world applications.

Fatemeh Daneshfar

Research

Abstract