融合CNN与Transformer的高分辨率遥感影像建筑物双流提取模型
Integration of CNN and Transformer for High-Resolution Remote Sensing Image Building Extraction: A Dual-Stream Network
- 2024年 页码:1-12
网络出版日期: 2024-03-07
DOI: 10.11834/jrs.20243307
扫 描 看 全 文
浏览全部资源
扫码关注微信
网络出版日期: 2024-03-07 ,
扫 描 看 全 文
刘宇鑫,孟瑜,邓毓弸,陈静波,刘帝佑.XXXX.融合CNN与Transformer的高分辨率遥感影像建筑物双流提取模型.遥感学报,XX(XX): 1-12
LIU Yuxin,MENG Yu,DENG Yupeng,CHEN Jingbo,LIU Diyou. XXXX. Integration of CNN and Transformer for High-Resolution Remote Sensing Image Building Extraction: A Dual-Stream Network. National Remote Sensing Bulletin, XX(XX):1-12
卷积神经网络(Convolutional Neural Network,CNN)和Transformer已被广泛应用于高分辨率遥感影像的建筑物提取任务。然而,CNN在建模长距离空间依赖时仍存在挑战,导致提取的建筑物存在内部空洞问题;而Transformer在捕捉空间局部细节特征上存在不足,容易导致建筑物边缘模糊及小型建筑物的漏检。为解决上述问题,本文提出了一种新型的双流网络模型用于高分辨率遥感影像的建筑物提取,名为ILGS-Net(Network for the Integration of Local and Global Features Stream)。该模型将CNN与Transformer相结合,采用多层级的局部-全局特征融合模块,实现了对建筑物的局部细节特征与全局上下文特征的高效融合。同时,在目标函数中引入边缘损失函数约束模型训练,提高了建筑物边界的定位精度。在三个高分辨率建筑物数据集上的实验结果显示,所提出方法的交并比均高于本文所对比的最佳方法,平均提高了1%。
Convolutional Neural Networks (CNNs) and Transformers have emerged as pivotal tools in the realm of building extraction tasks within high-resolution remote sensing images. While these techniques have seen widespread application
challenges persist for CNNs in effectively modeling long-range spatial dependencies
often leading to complications such as the emergence of internal holes in the extracted building structures. Conversely
Transformers exhibit limitations in capturing spatial local details
potentially resulting in the production of blurry building edges and the oversight of smaller structures. In response to these challenges
this paper presents an innovative dual-stream network model tailored for building extraction in high-resolution remote sensing images
denominated as ILGS-Net (Network for the Integration of Local and Global Features Stream).ILGS-Net is designed to capitalize on the strengths of both CNNs and Transformers. The model incorporates multi-level local-global feature fusion modules to seamlessly blend intricate local details and expansive global context features of buildings. In tandem
an edge loss function is integrated into the objective function
contributing to the refinement of building boundary localization precision.The proposed ILGS-Net endeavors to address the shortcomings of existing methodologies by efficiently combining the unique attributes of CNNs and Transformers. Multi-level local-global feature fusion modules play a pivotal role in striking a harmonious balance between capturing fine-grained local details and incorporating broader global context features of buildings. Simultaneously
the inclusion of an edge loss function serves as a guiding mechanism in model training
augmenting the precision of building boundary localization.Extensive experiments conducted across three high-resolution building datasets consistently demonstrate the superior performance of the proposed ILGS-Net compared to benchmark methods outlined in this paper. Notably
the proposed method achieves
on average
a remarkable 1% increase in Intersection over Union (IoU) across all three datasets.In conclusion
ILGS-Net emerges as a groundbreaking dual-stream network model expressly designed for building extraction in high-resolution remote sensing images. By seamlessly integrating CNNs and Transformers
along with the implementation of multi-level local-global feature fusion and the inclusion of an edge loss function
the model adeptly addresses challenges associated with spatial dependencies and local details
resulting in a marked improvement in the accuracy of building extraction. The experimental results underscore the efficacy of the proposed method
positioning it as a promising and influential approach for achieving high-precision building extraction in high-resolution remote sensing images. The confluence of advanced methodologies and innovative techniques within ILGS-Net marks a significant stride forward in the field of remote sensing image analysis. As technology continues to evolve
ILGS-Net represents a pivotal contribution that holds promise for further advancements in building extraction accuracy
providing a solid foundation for continued research and application in the realm of high-resolution remote sensing imagery analysis.Looking ahead
the success of ILGS-Net prompts further exploration and research avenues. Investigating the potential of similar integrative approaches in other remote sensing tasks holds promise. Additionally
refining and expanding the current model architecture to accommodate varying scales and complexities of urban landscapes is a logical progression. Future work should focus on translating these advancements into tangible benefits for decision-makers and stakeholders in urban development and disaster response.
建筑物提取深度学习双流网络边缘损失局部和全局特征融合
building extractiondeep learningdual-stream networklocal-global feature fusion
Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12): 2481-2495. [DOI: 10.1109/TPAMI.2016.2644615http://dx.doi.org/10.1109/TPAMI.2016.2644615]
Chen K Y, Zou Z X and Shi Z W. 2021. Building Extraction from Remote Sensing Images with Sparse Token Transformers. Remote Sensing, 13(21): 4441. [DOI: 10.3390/rs13214441http://dx.doi.org/10.3390/rs13214441]
Chen L C, Zhu Y, Papandreou G, Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//15th European Conference on Computer Vision. Munich: Springer: 833-851 [DOI: 10.1007/978-3-030-01234- 2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49]
Cooner A J, Shao Y and Campbell J B. 2016. Detection of Urban Damage Using Remote Sensing and Machine Learning Algorithms: Revisiting the 2010 Haiti Earthquake. Remote Sensing, 8(10): 868. [DOI: 10.3390/rs8100868http://dx.doi.org/10.3390/rs8100868]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929 (2020). [DOI: arXiv:2010.11929http://dx.doi.org/arXiv:2010.11929]
Fan M Y, Lai S Q, Huang J S, Wei X M, Chai Z H, Luo J F and Wei X L. 2021. Rethinking BiSeNet for Real-Time Semantic Segmentation//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE: 9711-9720. [DOI: 10.1109/CVPR46437.2021.00959http://dx.doi.org/10.1109/CVPR46437.2021.00959]
Guo H N, Du B, Zhang L P and Su X. 2022. A coarse-to-fine boundary refinement network for building footprint extraction from remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 183: 240-252. [DOI: 10.1016/j.isprsjprs.2021.11.005http://dx.doi.org/10.1016/j.isprsjprs.2021.11.005]
Ji S, Wei S and Lu M. 2018. Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set. IEEE Transactions on Geoscience and Remote Sensing, 57(1): 574-586. [DOI: 10.1109/TGRS.2018.2858817http://dx.doi.org/10.1109/TGRS.2018.2858817]
Li R, Zheng S Y, Zhang C, Duan C X, Su J L, Wang L B and Atkinson P M. 2021. Multiattention network for semantic segmentation of fine-resolution remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60: 1-13. [DOI: 10.1109/TGRS.2021.3093977http://dx.doi.org/10.1109/TGRS.2021.3093977]
Li X H, Bai X C, Li Z J, Zuo Z Y. High-Resolution Image Building Extraction Based on Multi-level Feature Fusion Network. Geomatics and Information Science of Wuhan University. 2022, 47(8): 1236-1244.
李星华, 白学辰, 李正军, 左芝勇. 面向高分影像建筑物提取的多层次特征融合网络. 武汉大学学报 (信息科学版). 2022. 47(8): 1236-1244 [DOI: 10.13203/j.whugis20210506http://dx.doi.org/10.13203/j.whugis20210506]
Lin N, Huang T, Sun P L and Wang Y Y. 2022. Building Extraction of High-resolution Remote Sensing Imagery on Optimized Mask-RCNN. Remote Sensing Information, 03:37.
林娜, 黄韬, 孙鹏林, 王玉莹. 2022. 优化Mask-RCNN的高分遥感影像建筑物提取. 遥感信息. 003: 037 [DOI: 10.3969/j.issn.1000-3177.2022.03.001http://dx.doi.org/10.3969/j.issn.1000-3177.2022.03.001]
Lin T Y, Dollár P, Girshick R, He K, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//2017 IEEE conference on computer vision and pattern recognition. Honolulu: IEEE: 936-944. [DOI: 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]
Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B N. 2021. Swin transformer: Hierarchical vision transformer using shifted windows//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE: 9992-10002. [DOI: 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//2015 IEEE conference on computer vision and pattern recognition. Boston: IEEE: 3431-3440. [DOI: 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]
Lyu S Y, Li J T, A X H, Yang C, Yang R C and Shang X M. Res_ASPP_UNet++: Building an extraction network from remote sensing imagery combining depthwise separable convolution with atrous spatial pyramid pooling, 27(02): 502-519
吕少云, 李佳田, 阿晓荟, 杨超, 杨汝春, 尚晓梅. 2023. Res_ASPP_UNet++:结合分离卷积与空洞金字塔的遥感影像建筑物提取网络. 遥感学报, 27(02): 502-19
Maggiori E, Tarabalka Y, Charpiat G and Alliez P. 2017. Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark//2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). Fort Worth: IEEE: 3226-3229. [DOI: 10.1109/IGARSS.2017.8127684http://dx.doi.org/10.1109/IGARSS.2017.8127684]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation//Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference. Munich: Springer: 234-241. [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Strudel R, Garcia R, Laptev I and Schmid C. 2021. Segmenter: Transformer for semantic segmentation//2021 IEEE/CVF international conference on computer vision. Montreal: IEEE: 7242-7252. [DOI: 10.1109/ICCV48922.2021.00717http://dx.doi.org/10.1109/ICCV48922.2021.00717]
Tan M X and Le Q. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks//36th International Conference on Machine Learning. PMLR: 6105-6114. [DOI: 10.48550/arXiv.1905.11946http://dx.doi.org/10.48550/arXiv.1905.11946]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez N A, Kaiser Ł and Polosukhin I. 2017. Attention is all you need. arxiv. [DOI: 10.48550/arXiv.1706.03762http://dx.doi.org/10.48550/arXiv.1706.03762]
Wang L B, Fang S H, Meng X L and Li R. 2022. Building Extraction With Vision Transformer. IEEE Transactions on Geoscience and Remote Sensing, 60: 1-11. [DOI: 10.1109/tgrs.2022.3186634http://dx.doi.org/10.1109/tgrs.2022.3186634]
Wang L B, Li R, Duan C X, Zhang C, Meng X L and Fang S H. 2022. A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 19: 1-5. [DOI: 10.1109/lgrs.2022.3143368http://dx.doi.org/10.1109/lgrs.2022.3143368]
Wang Y, Zeng X Q, Liao X H and Zhuang D F. 2022. B-FGC-Net: A Building Extraction Network from High Resolution Remote Sensing Imagery. Remote Sensing, 14(2): 269. [DOI: 10.3390/rs14020269http://dx.doi.org/10.3390/rs14020269]
Wei S Q, Ji S P and Lu M. 2019. Toward Automatic Building Footprint Delineation From Aerial Images Using CNN and Regularization. IEEE Transactions on Geoscience and Remote Sensing, 58(3): 2178-2189. [DOI: 10.1109/tgrs.2019.2954461].
Xiao X, Guo W L, Chen R, Hui Y L, Wang J N and Zhao H Y. 2022. A Swin Transformer-Based Encoding Booster Integrated in U-Shaped Network for Building Extraction. Remote Sensing, 14(11): 2611. [DOI: 10.3390/rs14112611http://dx.doi.org/10.3390/rs14112611]
Xu Z S, Guan H Y, Yu Y T, Lei X D and Zhao H H. 2022. A dual-attention capsule network for building extraction from high-resolution remote sensing imagery. Journal of Remote Sensing, 26(08): 1636-49
许正森, 管海燕, 彭代锋, 于永涛, 雷相达, 赵好好. 2022. 高分辨率遥感影像建筑物提取的注意力胶囊网络算法. 遥感学报, 26(08): 1636-1649 [DOI: 10.11834/jrs.20221577http://dx.doi.org/10.11834/jrs.20221577]
Yuan W and Xu W B. 2021. MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer. Remote Sensing, 13(23): 4743. [DOI: 10.3390/rs13234743http://dx.doi.org/10.3390/rs13234743]
Zhou Y, Chen Z L, Wang B, Li S J, Liu H, Xu D Z and Ma Chao. 2022. BOMSC-Net: Boundary Optimization and Multi-Scale Context Awareness Based Building Extraction From High-Resolution Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing, 60: 1-17. [DOI: 10.1109/tgrs.2022.3152575http://dx.doi.org/10.1109/tgrs.2022.3152575]
Zhu X X, Tuia D, Mou L C, Xia G S, Zhang L P, Xu F and Fraundorfer F. 2017. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geoscience and Remote Sensing Magazine, 5(4): 8-36. [DOI: 10.1109/mgrs.2017.2762307http://dx.doi.org/10.1109/mgrs.2017.2762307]
相关作者
相关机构