ADC-CPANet:A Remote Sensing Image Classification Method Based on Local-Global Feature Fusion
- Pages: 1-14(2023)
Published Online: 11 April 2023
DOI: 10.11834/jrs.20232658
扫 描 看 全 文
浏览全部资源
扫码关注微信
Published Online: 11 April 2023 ,
扫 描 看 全 文
王威,李希杰,王新.XXXX.ADC-CPANet:一种局部-全局特征融合的遥感图像分类方法.遥感学报,XX(XX): 1-14
Wang Wei,Li Xijie,Wang Xin. XXXX. ADC-CPANet:A Remote Sensing Image Classification Method Based on Local-Global Feature Fusion. National Remote Sensing Bulletin, XX(XX):1-14
遥感图像具有丰富的纹理信息和复杂的整体结构,因此在场景分类任务中进行多尺度的特征提取至关重要。基于此,设计了局部特征提取模块ADC模块(Aggregation Depthwise Convolution Block,ADC)和全局-局部特征提取模块CPA模块(Convolution Parallel Attention Block,CPA),并在ADC模块中提出一种非对称深度卷积组,以增强模型对图像翻转和旋转的鲁棒性;在CPA模块中提出一种能够扩大感受野并增强特征提取能力的多分组卷积头分解注意力。以ADC模块和CPA模块为基础构建全新的遥感图像场景分类模型ADC-CPANet,在各阶段采用堆叠ADC模块和CPA模块的策略,从而使模型具有更好的全局特征和局部特征提取能力。为验证ADC-CPANet的有效性,本文使用开源数据集RSSCN7数据集和SIRI-WHU数据集测试ADC-CPANet与其他深度学习网络的复杂度和识别能力。实验结果表明,ADC-CPANet的分类准确率分别高达96.43%和96.04%,优于其他先进的模型。
The entire amount and types of data of high-resolution remote sensing images are booming with the rapid development of remote sensing observation technologies
such as satellites and unmanned aerial vehicles. Remote sensing information processing is entering the "era of remote sensing big data". High resolution remote sensing images enjoy more abundant texture
detailed information and complex overall structure. High-resolution remote sensing images are of great significance for urban planning and other application scenarios. At the same time
Images of the same category have great differences
and some images of different categories become similar. Therefore
multi-scale feature extraction of remote sensing images plays a significant role in the task of remote sensing image scene classification. According to the different feature representation methods
the existing remote sensing image scene classification methods can be divided into two categories: methods based on manual design features and methods based on deep learning. Scene classification algorithms for remote sensing images based on manual features cover scale invariant feature transformation
gradient scale histogram
and so on. Although these methods can achieve good classification results in some simple scene classification tasks
the feature information extracted by these methods may be incomplete or redundant
so the accuracy of classification in complex scenes is still low. The methods based on deep learning have gained incredible progress in scene classification due to its powerful feature extraction ability. In comparison to traditional methods
convolution neural networks appear in visual tasks that share more complex connections and more diverse convolution forms
which can extract local features more effectively. Nevertheless
CNNs perform poorly in the process of extracting long-distance dependencies among features. Transformer architecture has been successfully applied to the field of computer vision in recent years. Unlike traditional CNNs
Transformer's self-attention layer enables global feature extraction of images. Some recent studies have shown that using CNN and Transformer as hybrid architectures is conducive to integrating the advantages of these two architectures. This paper proposes an aggregation depthwise convolution module and a convolution parallel attention module. The aggregation depthwise convolution module can effectively extract local feature information and enhance the robustness of the model to image flipping and rotation. The convolution parallel attention module can effectively extract global features
local features and fuse the two features. A multi-group convolution head decomposition module was designed in the convolution parallel attention module
which can expand the perception field and enhance the capacity of featuring information extraction. We designed a remote sensing image scene classification model ADC-CPANet on the basis of two modules. The strategy of stacking the ADC module and CPA module was applied to each stage of the model
which enabled the model to possess greater global and local feature extraction capabilities. RSSCN7 and Google Image datasets were selected to verify the effectiveness of the ADC-CPANet method. The experimental results demonstrated that ADC-CPANet had achieved a classification accuracy of 96.43% in the RSSCN7 dataset and 96.04% in the Google Image dataset respectively
which was superior to other advanced models. It can be seen that ADC-CPANet can extract global and local features and obtain competitive scene classification accuracy.
遥感图像场景分类卷积神经网络Transformer多分组卷积头分解注意力ADC-CPANet模型
remote sensing imagescene classificationconvolutional neural networkTransformerMulti-Gconv Head Decomposition AttentionADC-CPANet model
Bashmal L, Bazi Y and Rahhal M A. 2021. Deep Vision Transformers for Remote Sensing Scene Classification//Proceedings of 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. Brussels, Belgium: IEEE: 2815-2818 [DOI: 10.1109/IGARSS47720.2021.9553684http://dx.doi.org/10.1109/IGARSS47720.2021.9553684]
Dai Z H, Liu H X, Le Q V, and Tan M X. 2021. CoAtNet: Marrying Convolution and Attention for All Data Sizes. [2022-11-23] https://arxiv.org/abs/2106.04803https://arxiv.org/abs/2106.04803
Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego: IEEE: 886-893 [DOI: 10.1109/CVPR.2005.177http://dx.doi.org/10.1109/CVPR.2005.177]
Deng P F, Xu K J and Huang H. 2021. CNN-GCN-based dual-stream network for scene classification of remote sensing images. National Remote Sensing Bulletin, 25(11): 2270-2282
邓培芳, 徐科杰, 黄鸿. 2021. 基于CNN-GCN双流网络的高分辨率遥感影像场景分类. 遥感学报, 25(11): 2270-2282 [DOI: 10.11834/jrs.20210587http://dx.doi.org/10.11834/jrs.20210587]
Ding X H, Guo Y C, Ding G G and Han J G. 2019. ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 1911-1920 [DOI: 10.1109/ICCV.2019.00200http://dx.doi.org/10.1109/ICCV.2019.00200]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. [2022-11-23] https://arxiv.org/abs/2010.11929https://arxiv.org/abs/2010.11929
Guo J Y, Han K, Wu H, Tang Y H, Chen X H, Wang Y H and Xu C. 2022. CMT: Convolutional Neural Networks Meet Vision Transformers//Proceedings of 2022 IEEE/CVFConference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE/CVF: 12165-12175 [DOI: 10.1109/CVPR52688.2022.01186http://dx.doi.org/10.1109/CVPR52688.2022.01186]
Guo M H, Lu C Z, Liu Z N, Cheng M M and Hu S M. 2022. Visual Attention Network. [2022-11-23] https://doi.org/10.48550/arXiv.2202.09741https://doi.org/10.48550/arXiv.2202.09741
He K M, Zhang X Y, Ren S Q, Sun J and Research M. 2016. Deep Residual Learning for Image Recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hendrycks D and Gimpel K. 2020. Gaussian Error Linear Units (GELUs). [2022-11-23] https://arxiv.org/abs/1606.08415https://arxiv.org/abs/1606.08415
Hu J, Shen L and Sun G. 2018. Squeeze-and-Excitation Networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 7132-7141 [DOI: 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Huang T, Huang L, You S, Wang F, Qian C and Xu C. 2022. LightViT: Towards Light-Weight Convolution-Free Vision Transformers. [2022-11-23] https://arxiv.org/abs/2207.05557https://arxiv.org/abs/2207.05557
Ioffe S and Szegedy C. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. [2022-11-23] https://arxiv.org/abs/1502.03167https://arxiv.org/abs/1502.03167
Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet Classification with Deep Convolutional Neural Networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc: 1907-1105
Liu K, Zhou Z, Li S Y, Liu Y F, Wan X, Liu Z W, Tan H and Zhang W F. 2020. Scene classification dataset using the Tiangong-1 hyperspectral remote sensing imagery and its applications. Journal of Remote Sensing, 24(9): 1077-1087
刘康, 周壮, 李盛阳, 刘云飞, 万雪, 刘志文, 谭洪, 张万峰. 2020. 天宫一号高光谱遥感场景分类数据集及应用. 遥感学报, 24(9): 1077-1087 [DOI: 10.11834/jrs.20209323http://dx.doi.org/10.11834/jrs.20209323]
Li L L, Tian T, Li H and Wang L Z. 2020. SE-HRNet: A Deep High-Resolution Network with Attention for Remote Sensing Scene Classification//Proceedings of 2020 IEEE International Geoscience and Remote Sensing Symposium. Waikoloa, HI, USA: IEEE: 533-536 [DOI: 10.1109/IGARSS39084.2020.9324633http://dx.doi.org/10.1109/IGARSS39084.2020.9324633]
Li M T, Ma J J, Tang X, Han X, Zhu C and Jiao L C. 2022. Resformer: Bridging Residual Network and Transformer for Remote Sensing Scene Classification//Proceedings of 2022 IEEE International Geoscience and Remote Sensing Symposium. Kuala Lumpur, Malaysia: IEEE: 3147-3150 [DOI: 10.1109/IGARSS46834.2022.9883041http://dx.doi.org/10.1109/IGARSS46834.2022.9883041]
Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B N. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, QC, Canada: IEEE: 10012-10022 [DOI: 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]
Liu Z, Mao H Z, Wu C Y, Feichtenhofer C, Darrell T and Xie S N. 2022. A ConvNet for the 2020s//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE: 11966-11976 [DOI: 10.1109/CVPR52688.2022.01167http://dx.doi.org/10.1109/CVPR52688.2022.01167]
Lowe D G. 1999. Object recognition from local scale-invariant features//Proceedings of the 7th IEEE International Conference on Computer Vision. Kerkyra, Greece: IEEE: 1150-1157 [DOI: 10.1109/ICCV.1999.790410http://dx.doi.org/10.1109/ICCV.1999.790410]
Ouyang S B, Chen W T, Li X J, Dong Y S and Wang L Z. 2022. Geomorphological scene classification dataset of high-resolution remote sensing imagery in vegetation-covered areas. National Remote Sensing Bulletin, 26(4) : 606-619
欧阳淑冰, 陈伟涛, 李显巨, 董玉森, 王力哲. 2022. 植被覆盖区高精度遥感地貌场景分类数据集. 遥感学报, 26(4): 606-619 [DOI: 10.11834/jrs.20221385http://dx.doi.org/10.11834/jrs.20221385]
Park N and Kim S. 2022. How do vision transformers work//Proceedings of the 10th International Conference on Learning Representations
Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D and Batra D. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 618-626 [DOI: 10.1109/ICCV.2017.74http://dx.doi.org/10.1109/ICCV.2017.74]
Srinivas A, Lin T Y, Parmar N, Shlens J, Abbeel P and Vaswani A. 2021. Bottleneck Transformers for Visual Recognition//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA: IEEE: 16514-16524 [DOI: 10.1109/CVPR46437.2021.01625http://dx.doi.org/10.1109/CVPR46437.2021.01625]
Sun K, Xiao B, Liu D and Wang J D. 2019. Deep High-Resolution Representation Learning for Human Pose Estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE: 5686-5696 [DOI: 10.1109/CVPR.2019.00584http://dx.doi.org/10.1109/CVPR.2019.00584]
Tan M X and Le Q V. 2021. Efficientnetv2: Smaller models and faster training. [2022-11-23] https://arxiv.org/abs/2104.00298https://arxiv.org/abs/2104.00298
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention Is All You Need//Proceedings of 31st Conference on Neural Information Processing Systems. Long Beach, Curran Associates Inc: 6000-6010 [DOI: 10.48550/arXiv.1706.03762http://dx.doi.org/10.48550/arXiv.1706.03762]
Xu C A, Lü Y, Zhang X H, Liu Y, Cui C H and Gu X Q. 2021. A Discriminative feature representation method based on dual attention mechanism for remote sensing image scene classification. Journal of Electronics & Information Technology, 43(3): 683-691
徐从安, 吕亚飞, 张筱晗, 刘瑜, 崔晨浩, 顾祥岐. 2021. 基于双重注意力机制的遥感图像场景分类特征表示方法. 电子与信息学报, 43(3): 683-691 [DOI: 10.11999/JEIT200568http://dx.doi.org/10.11999/JEIT200568]
Xu K J, Deng P F, Huang H . 2021. HSRS-SC: a hyperspectral image dataset for remote sensing scene classification. Journal of Image and Graphics, 26(8): 1809-1822
徐科杰, 邓培芳, 黄鸿. 2021. HSRS-SC: 面向遥感场景分类的高光谱图像数据集. 中国图象图形学报, 26(8): 1809-1822 [DOI: 10.11834/jig.200835http://dx.doi.org/10.11834/jig.200835]
Yu D X, Zhang B M, Zhao C, Guo H T and Lu J. 2020. Scene classification of remote sensing image using ensemble convolutional neural network. Journal of Remote Sensing, 24(6): 717-727
余东行, 张保明, 赵传, 郭海涛, 卢俊. 2020. 联合卷积神经网络与集成学习的遥感影像场景分类. 遥感学报, 24(06): 717-727 [DOI: 10.11834/jrs.20208273http://dx.doi.org/10.11834/jrs.20208273]
Yu W H, Luo M, Zhou P, Si C Y, Zhou Y C, Wang X C, Feng J S and Yan S C. 2022. MetaFormer is Actually What You Need for Vision//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE: 10809-10819 [DOI: 10.1109/CVPR52688.2022.01055http://dx.doi.org/10.1109/CVPR52688.2022.01055]
Zhang J R, Zhao H W and Li J. 2021. TRS: Transformers for Remote Sensing Scene Classification. Remote Sensing, 13(20): 4143 [DOI: 10.3390/rs13204143http://dx.doi.org/10.3390/rs13204143]
Zhao B, Zhong Y F, Xia G S and Zhang L P. 2016. Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 54(4): 2108-2123 [DOI: 10.1109/TGRS.2015.2496185http://dx.doi.org/10.1109/TGRS.2015.2496185]
Zhu Q Q, Deng W H, Zheng Z, Zhong Y F, Guan Q F, Lin W H, Zhang L P and Li D. 2021. A Spectral-Spatial-Dependent Global Learning Framework for Insufficient and Imbalanced Hyperspectral Image Classification. IEEE Transactions on Cybernetics, 52(11), 11709-11723 [DOI: 10.1109/TCYB.2021.3070577http://dx.doi.org/10.1109/TCYB.2021.3070577]
Zhu Q Q, Lei Y, Sun X L, Guan Q F, Zhong Y F, Zhang L P and Li D. 2022. Knowledge-guided land pattern depiction for urban land use mapping: A case study of Chinese cities. Remote Sensing of Environment, 272, 112916 [DOI: 10.1016/j.rse.2022.112916http://dx.doi.org/10.1016/j.rse.2022.112916]
Zou Q, Ni L H, Zhang T and Wang Q. 2015. Deep Learning Based Feature Selection for Remote Sensing Scene Classification. IEEE Geoscience and Remote Sensing Letters, 12(11): 2321-2325 [DOI:10.1109/LGRS.2015.2475299http://dx.doi.org/10.1109/LGRS.2015.2475299]
相关文章
相关作者
相关机构