Search results

1 – 3 of 3
Article
Publication date: 19 October 2023

Huaxiang Song

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition…

Abstract

Purpose

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.

Design/methodology/approach

This work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.

Findings

Extensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.

Originality/value

MBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 13 November 2024

Huaxiang Song, Hanjun Xia, Wenhui Wang, Yang Zhou, Wanbo Liu, Qun Liu and Jinling Liu

Vision transformers (ViT) detectors excel in processing natural images. However, when processing remote sensing images (RSIs), ViT methods generally exhibit inferior accuracy…

Abstract

Purpose

Vision transformers (ViT) detectors excel in processing natural images. However, when processing remote sensing images (RSIs), ViT methods generally exhibit inferior accuracy compared to approaches based on convolutional neural networks (CNNs). Recently, researchers have proposed various structural optimization strategies to enhance the performance of ViT detectors, but the progress has been insignificant. We contend that the frequent scarcity of RSI samples is the primary cause of this problem, and model modifications alone cannot solve it.

Design/methodology/approach

To address this, we introduce a faster RCNN-based approach, termed QAGA-Net, which significantly enhances the performance of ViT detectors in RSI recognition. Initially, we propose a novel quantitative augmentation learning (QAL) strategy to address the sparse data distribution in RSIs. This strategy is integrated as the QAL module, a plug-and-play component active exclusively during the model’s training phase. Subsequently, we enhanced the feature pyramid network (FPN) by introducing two efficient modules: a global attention (GA) module to model long-range feature dependencies and enhance multi-scale information fusion, and an efficient pooling (EP) module to optimize the model’s capability to understand both high and low frequency information. Importantly, QAGA-Net has a compact model size and achieves a balance between computational efficiency and accuracy.

Findings

We verified the performance of QAGA-Net by using two different efficient ViT models as the detector’s backbone. Extensive experiments on the NWPU-10 and DIOR20 datasets demonstrate that QAGA-Net achieves superior accuracy compared to 23 other ViT or CNN methods in the literature. Specifically, QAGA-Net shows an increase in mAP by 2.1% or 2.6% on the challenging DIOR20 dataset when compared to the top-ranked CNN or ViT detectors, respectively.

Originality/value

This paper highlights the impact of sparse data distribution on ViT detection performance. To address this, we introduce a fundamentally data-driven approach: the QAL module. Additionally, we introduced two efficient modules to enhance the performance of FPN. More importantly, our strategy has the potential to collaborate with other ViT detectors, as the proposed method does not require any structural modifications to the ViT backbone.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 14 December 2023

Huaxiang Song, Chai Wei and Zhou Yong

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of…

Abstract

Purpose

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of clustered ground objects and noisy backgrounds. Recent research typically leverages larger volume models to achieve advanced performance. However, the operating environments of remote sensing commonly cannot provide unconstrained computational and storage resources. It requires lightweight algorithms with exceptional generalization capabilities.

Design/methodology/approach

This study introduces an efficient knowledge distillation (KD) method to build a lightweight yet precise convolutional neural network (CNN) classifier. This method also aims to substantially decrease the training time expenses commonly linked with traditional KD techniques. This approach entails extensive alterations to both the model training framework and the distillation process, each tailored to the unique characteristics of RSIs. In particular, this study establishes a robust ensemble teacher by independently training two CNN models using a customized, efficient training algorithm. Following this, this study modifies a KD loss function to mitigate the suppression of non-target category predictions, which are essential for capturing the inter- and intra-similarity of RSIs.

Findings

This study validated the student model, termed KD-enhanced network (KDE-Net), obtained through the KD process on three benchmark RSI data sets. The KDE-Net surpasses 42 other state-of-the-art methods in the literature published from 2020 to 2023. Compared to the top-ranked method’s performance on the challenging NWPU45 data set, KDE-Net demonstrated a noticeable 0.4% increase in overall accuracy with a significant 88% reduction in parameters. Meanwhile, this study’s reformed KD framework significantly enhances the knowledge transfer speed by at least three times.

Originality/value

This study illustrates that the logit-based KD technique can effectively develop lightweight CNN classifiers for RSI classification without substantial sacrifices in computation and storage costs. Compared to neural architecture search or other methods aiming to provide lightweight solutions, this study’s KDE-Net, based on the inherent characteristics of RSIs, is currently more efficient in constructing accurate yet lightweight classifiers for RSI classification.

Details

International Journal of Web Information Systems, vol. 20 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 3 of 3