Breast cancer represents one of the leading cancer diagnoses in women around the world. Early detection and accurate classification of breast cancer from medical images are crucial, as they enable timely treatment, which can significantly improve patient outcomes. Ultrasound imaging is a popular diagnostic method in radiology for evaluating breast health. Over the past ten years, deep learning approaches, especially Convolutional Neural Networks (CNNs), have been used to develop comprehensive systems for recognizing image patterns. More recently, the Vision Transformer (ViT) has gained attention as a novel deep learning architecture, largely because of its self-attention mechanisms, which have greatly improved the field of image processing. These models have exhibited strong performance across a wide range of image-related applications. Computer-Aided Diagnosis (CAD) systems in medical field have increasingly adopted deep learning methodologies, recognized for their superior ability to extract essential features from medical images. This study proposes a hybrid deep learning approach that integrates CNNs with ViTs to enhance breast cancer diagnosis in ultrasound images. This method capitalizes on the beneficial attributes of CNNs and ViTs to boost the accuracy of breast cancer diagnosis. By combining the powerful local feature extraction ability of CNNs with ViTs focus on long-range dependencies and global features, the hybrid network, integrating multiple vision architectures, optimizes the utilization of information, enabling a more thorough and nuanced interpretation of medical imaging data. The methodology was assessed using two publicly accessible datasets, revealing superior performance compared to current state-of-the-art techniques. This indicates that our method has the potential to generalize across various datasets. The high accuracy achieved by this hybrid deep learning model suggests that it can play a significant role in improving breast cancer diagnosis.