基于编解码网络的航空影像像素级建筑物提取

陈凯强; 高鑫; 闫梦龙; 张跃; 孙显

doi:10.11834/jrs.20209056

博士论坛 | 浏览量 : 0 下载量: 317 CSCD: 15 更多指标

PDF
导出
分享
收藏
专辑

基于编解码网络的航空影像像素级建筑物提取
Building extraction in pixel level from aerial imagery with a deep encoder-decoder network
2020年24卷第9期页码：1134-1142
纸质出版日期： 2020-09-07 ，
DOI： 10.11834/jrs.20209056

扫描看全文

陈凯强,高鑫,闫梦龙,张跃,孙显.2020.基于编解码网络的航空影像像素级建筑物提取.遥感学报,24(9): 1134-1142

CHEN Kaiqiang,GAO Xin,YAN Menglong,ZHANG Yue,SUN Xian. 2020. Building extraction in pixel level from aerial imagery with a deep encoder-decoder network. Journal of Remote Sensing(Chinese)，24(9): 1134-1142[DOI：10.11834/jrs.20209056]
陈凯强,高鑫,闫梦龙,张跃,孙显.2020.基于编解码网络的航空影像像素级建筑物提取.遥感学报,24(9): 1134-1142 DOI： 10.11834/jrs.20209056.

CHEN Kaiqiang,GAO Xin,YAN Menglong,ZHANG Yue,SUN Xian. 2020. Building extraction in pixel level from aerial imagery with a deep encoder-decoder network. Journal of Remote Sensing(Chinese)，24(9): 1134-1142[DOI：10.11834/jrs.20209056] DOI：

摘要

建筑物提取在城市规划等土地利用分析中发挥着重要作用。用于提取建筑物的传统方法通常基于手工特征和分类器，导致精度较低。本文基于编解码结构的卷积神经网络CNN(Convolutional Neural Networks)

自主学习多级的和具有区分度的特征来更好地辨识建筑物和背景，实现航空影像中的像素级建筑物提取。该网络由编码子网络和解码子网络两部分组成，编码子网络对输入图像进行空间分辨率压缩，完成特征提取；解码子网络从特征中提升空间分辨率，完成像素级的建筑物提取。此外，本文使用视野增强FoVE(Field-of-View Enhancement)方法减轻边缘现象(切片边缘附近的建筑物提取精度通常低于中心区域附近的精度)的影响，并分别在两个建筑物提取标准数据集上的实验表明，编解码卷积神经网络能有效实现像素级建筑物提取，FoVE能有效提高建筑物提取准确率；通过改变预测时切片大小和重叠度，分析其对建筑物提取结果的影响，揭示了FoVE的饱和性。

Abstract

Building extraction plays a significant role in land use analysis like urban planning. Classical methods based on hand-crafted features fail to derive prominent building extraction results due to the limited representation capacity of the hand-crafted features. In this paper

we achieve building extraction in pixel level based on a deep Convolutional Neural Network (CNN) with an encoder-decoder structure. In contrast to the hand-crafted features that require professional knowledge and have a poor representation capacity

convolutional neural networks are equipped with a high representation capacity and able to learn highly abstract and distinguishing features from data. The encoder is used to derived a space compressed representation of the input raw image. This compressed representation is also called a feature of the input image and it is assumed to be abstract and distinguishing. The decoder uses the feature as input and recover the space resolution to the size of the input image. Thereby

the encoder-decoder network achieves pixel-wise building extraction in an end-to-end way from the raw image to the building extraction result.

Applying the encoder-decoder network to building extraction will cause a Marginal Phenomenon (MP). Specifically

the prediction accuracy near the edges of a patch is usually lower than that near the central area. Marginal phenomenon will lead to the reduction of building extraction accuracy. To alleviate this effect

we propose the usage of the Field of View Enhancement (FoVE) method. The FoVE method includes two parts: enlarging the patch size and cropping patches with overlaps when making predictions. Therefore

the FoVE method contains two hyper-parameters

which are patch size and overlapping size. Extensive experiments on two building extraction datasets are conducted to analyze the impact of the two hyper-parameters through the Precision-Recall Curves (PRC) and some interesting conclusions are derived from the the analysis: (1) Enlarging the input patch size when making prediction can effectively improve the building extraction performance while the improvement saturates as the overlapping size increases; (2) Cropping patches with an overlap when making prediction can improve the building extraction performance while the improvement saturates as the input patch size increases; (3) The FoVE can effectively improve building extraction accuracy but this improvement from the FoVE has a limit; (4) The convolutional neural network for building extraction plays the key role and further attentions should be focused on the network design.

In addition to the numerical analysis of the FoVE experimental results

we attempt to explain why FoVE works and why it has a limit. We blame them on the Field of View (FoV) and that is reason why the method is call FoVE. FoV plays an important role in building extraction and a larger FoV is beneficial to building extraction. Firstly

the marginal phenomenon is caused by the lack of context information of the marginal pixels. FoVE improves the overall accuracy through abandoning the unreliable predictions of the marginal pixels. Secondly

enlarging input patches can enlarge the FoV of each pixel and thus improves the accuracy. Thirdly

the the improvement from FoVE has a limit because that when the field of view is large enough

the improvement derived from more contextual information can be ignore.

关键词

遥感建筑物提取卷积神经网络深度学习航空影像

Keywords

remote sensingbuilding extractionconvolutional neural networkdeep learningaerial imagery

references

Chen K Q, Fu K, Yan M L, Gao X, Sun X and Wei X. 2018a. Semantic segmentation of aerial images with shuffling convolutional neural networks. IEEE Geoscience and Remote Sensing Letters, 15(2): 173-177 [DOI: 10.1109/LGRS.2017.2778181http://dx.doi.org/10.1109/LGRS.2017.2778181]

Chen K Q, Weinmann M, Gao X, Yan M L, Hinz S, Jutzi B and Weinmann M, 2018b. Residual shuffling convolutional neural networks for deep semantic image segmentation using multi-modal data. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4(2): 65-72 [DOI: 10.5194/isprs-annals-IV-2-65-2018http://dx.doi.org/10.5194/isprs-annals-IV-2-65-2018]

Chen T Q, Li M, Li Y T, Lin M, Wang N Y, Wang M J, Xiao T J, Xu B, Zhang C Y and Zhang Z. 2015. Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274

Decatur S E. 1989. Application of neural networks to terrain classification//1989 International Joint Conference on Neural Networks. Washing: IEEE: 283-288 [DOI: 10.1109/IJCNN.1989.118592http://dx.doi.org/10.1109/IJCNN.1989.118592]

Grabner H, Nguyen T T, Gruber B and Bischof H. 2008. On-line boosting-based car detection from aerial images. ISPRS Journal of Photogrammetry and Remote Sensing, 63(3): 382-396 [DOI: 10.1016/j.isprsjprs.2007.10.005http://dx.doi.org/10.1016/j.isprsjprs.2007.10.005]

Haralick R M, Shanmugam K and Dinstein I H. 1973. Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, SMC-3(6): 610-621[DOI: 10.1109/TSMC.1973.4309314http://dx.doi.org/10.1109/TSMC.1973.4309314]

He K M, Zhang X Y, Ren S Q and Sun J. 2015. Delving deep into rectifiers: surpassing human-level performance on imagenet classification//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1026-1034 [DOI: 10.1109/ICCV.2015.123http://dx.doi.org/10.1109/ICCV.2015.123]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Huang C, Davis L S and Townshend J R G. 2002. An assessment of support vector machines for land cover classification. International Journal of Remote Sensing, 23(4): 725-749 [DOI: 10.1080/01431160110040323http://dx.doi.org/10.1080/01431160110040323]

Krizhevsky A, Sutskever I and Hinton G E. 2012. Imagenet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY: ACM: 1097-1105

Lee J, Weger R C, Sengupta S K and Welch R M. 1990. A neural network approach to cloud classification. IEEE Transactions on Geoscience and Remote Sensing, 28(5): 846-855 [DOI: 10.1109/36.58972http://dx.doi.org/10.1109/36.58972]

Mnih V. 2013. Machine Learning for Aerial Image Labeling. Toronto: University of Toronto

Paisitkriangkrai S, Sherrah J, Janney P and van Den Hengel A. 2015. Effective semantic pixel labelling with convolutional networks and conditional random fields//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, MA, USA: IEEE: 36-43 [DOI: 10.1109/CVPRW.2015.7301381http://dx.doi.org/10.1109/CVPRW.2015.7301381]

Paola J D and Schowengerdt R A. 1995. A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification. IEEE Transactions on Geoscience and Remote Sensing, 33(4): 981-996 [DOI: 10.1109/36.406684http://dx.doi.org/10.1109/36.406684]

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S A, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211-252 [DOI: 10.1007/s11263-015-0816-yhttp://dx.doi.org/10.1007/s11263-015-0816-y]

Saito S and Aoki Y. 2015. Building and road detection from large aerial imagery//Proceedings of SPIE 9405, Image Processing: Machine Vision Applications VIII. San Francisco: SPIE [DOI: 10.1117/12.2083273http://dx.doi.org/10.1117/12.2083273]

Saito S, Yamashita T and Aoki Y. 2016. Multiple object extraction from aerial imagery with convolutional neural networks. Journal of Imaging Science and Technology, 60(1): 010402 [DOI: 10.2352/J.ImagingSci.Technol.2016.60.1.010402http://dx.doi.org/10.2352/J.ImagingSci.Technol.2016.60.1.010402]

Sherrah J. 2016. Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery. arXiv:1606.02585

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

Song X M, Fan G L and Rao M. 2005. Automatic CRP mapping using nonparametric machine learning approaches. IEEE Transactions on Geoscience and Remote Sensing, 43(4): 888-897[DOI: 10.1109/TGRS.2005.844031http://dx.doi.org/10.1109/TGRS.2005.844031]

Srivastava R K, Greff K and Schmidhuber J. 2015. Highway networks. arXiv:1505.00387

Volpi M and Tuia D. 2017. Dense semantic labeling of subdecimeter resolution images with convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 55(2): 881-893 [DOI: 10.1109/TGRS.2016.2616585http://dx.doi.org/10.1109/TGRS.2016.2616585]

Zeiler M D, Krishnan D, Taylor G W and Fergus R. 2010. Deconvolutional networks//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE: 2528-2535 [DOI: 10.1109/CVPR.2010.5539957http://dx.doi.org/10.1109/CVPR.2010.5539957]

Zhang H, Lei H and Chen K Q. 2018. Application of densely connected deconvolutional neural network in building extraction from remote sensing images. Computer Engineering and Applications, 54(11): 140-144, 152

张欢, 雷宏, 陈凯强. 2018. 密集反卷积网络在遥感建筑物提取中的应用. 计算机工程与应用, 54(11): 140-144, 152 [DOI: 10.3778/j.issn.1002-8331.1701-0316http://dx.doi.org/10.3778/j.issn.1002-8331.1701-0316]

文章被引用时，请邮件提醒。

提交

基于光谱—空间注意力双边网络的高光谱图像分类

MtSCCD：面向深度学习的土地利用场景分类与变化检测数据集

光学信号Token引导的异源遥感变化检测网络

考虑光谱信息和超像素分割的高光谱解混网络

高光谱遥感影像异常目标检测研究进展