Instruction tuning has greatly improved how large language models (LLMs) respond to human-like instructions. However, fully fine-tuning these models is still computationally demanding, and many existing parameter-efficient methods fall short, particularly when it comes to uncertainty estimation and working effectively across different modalities. To address this, we introduce UMP-Net (Uncertainty-Aware Mixture of Prompts Network), a new approach designed to enhance the ability of LLaMA to follow instructions. UMPNet combines a novel mixture of prompts (MoPs) technique with Latent Noise Prompting, KNN-based Heterogeneous Clustering, and Conformal Predictions to select the most reliable prompts dynamically while accounting for uncertainty. In addition, it features a CLIPbased multi-modal architecture to streamline vision-language integration. We evaluated UMP-Net on a range of benchmarks including ScienceQA, COCO Caption, and various zero-shot multi-modal tasks. The results show a strong performance: an average accuracy of 88.41% on ScienceQA and a CIDEr score of 158.3 on COCO Caption, surpassing models such as LLaVA, LLaMA-Adapter, and LLaMA-Excitor. These findings suggest that UMPNet offers both improved multi-modal capability and computational efficiency. Further ablations demonstrate UMP-Net’s conformal prediction module provides robust uncertainty estimates under noise and domain shifts, outperforming Bayesian alternatives in coverage guarantees with minimal overhead. The code of our proposed model is available here: https://github.com/abdulhadyabas2/UMP-NetUncertainty.