مشخصات پژوهش

صفحه نخست /SimPoolFormer: A two-stream ...
عنوان SimPoolFormer: A two-stream vision transformer for hyperspectral image classification
نوع پژوهش مقاله چاپ‌شده در مجلات علمی
کلیدواژه‌ها Hyperspectral data, ViT, Vision transformer, MLP, Deep learning, HSI
چکیده The ability of vision transformers (ViTs) to accurately model global dependencies has completely changed the field of vision research. However, because of their drawbacks, such as their high computational costs, dependence on significant labeled datasets, and restricted capacity to capture essential local features, efforts are being made to create more effective alternatives. On the other hand, vision multilayer perceptron (MLP) architectures have shown excellent capability in image classification tasks, performing equivalent to or even better than the widely used state-of-the-art ViTs and convolutional neural networks (CNNs). Vision MLPs have linear computational complexity, require less training data, and can attain long-range data dependencies through advanced mechanisms similar to transformers at much lower computational costs. Thus, in this paper, a novel deep learning architecture is developed, namely, SimPoolFormer, to address current shortcomings imposed by vision transformers. SimPoolFormer is a two-stream attention-in-attention vision transformer architecture based on two computationally efficient networks. The developed architecture replaces the computationally intensive multi-headed self-attention in ViT with SimPool for efficiency, while ResMLP is adopted in a second stream to enhance hyperspectral image (HSI) classification, leveraging its linear attention-based design. Results illustrate that SimPoolFormer is significantly superior to several other deep learning models, including 1D-CNN, 2D-CNN, RNN, VGG-16, EfficientNet, ResNet-50, and ViT on three complex HSI datasets: QUH-Tangdaowan, QUH-Qingyun, and QUH-Pingan. For example, in terms of average accuracy, SimPoolFormer improved the HSI classification accuracy over 2D-CNN, VGG-16, EfficientNet, ViT, ResNet-50, RNN, and 1D-CNN by 0.98%, 3.81%, 4.16%, 7.94%, 9.45%, 12.25%, and 13.95%, respectively, on the QUH-Qingyun dataset.
پژوهشگران سوالپا کومار روی (نفر اول)، علی جمالی (نفر دوم)، جوسیلن چانوسوت (نفر سوم)، پدرام قمیسی (نفر چهارم)، ابراهیم قادرپور (Ebrahim Ghaderpour) (نفر پنجم)، هیمن شهابی (نفر ششم به بعد)