基于编解码网络的航空影像像素级建筑物提取
Building extraction in pixel level from aerial imagery with a deep encoder-decoder network
- 2020年24卷第9期 页码:1134-1142
纸质出版日期: 2020-09-07
DOI: 10.11834/jrs.20209056
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2020-09-07 ,
扫 描 看 全 文
陈凯强,高鑫,闫梦龙,张跃,孙显.2020.基于编解码网络的航空影像像素级建筑物提取.遥感学报,24(9): 1134-1142
CHEN Kaiqiang,GAO Xin,YAN Menglong,ZHANG Yue,SUN Xian. 2020. Building extraction in pixel level from aerial imagery with a deep encoder-decoder network. Journal of Remote Sensing(Chinese),24(9): 1134-1142[DOI:10.11834/jrs.20209056]
建筑物提取在城市规划等土地利用分析中发挥着重要作用。用于提取建筑物的传统方法通常基于手工特征和分类器,导致精度较低。本文基于编解码结构的卷积神经网络CNN(Convolutional Neural Networks)
自主学习多级的和具有区分度的特征来更好地辨识建筑物和背景,实现航空影像中的像素级建筑物提取。该网络由编码子网络和解码子网络两部分组成,编码子网络对输入图像进行空间分辨率压缩,完成特征提取;解码子网络从特征中提升空间分辨率,完成像素级的建筑物提取。此外,本文使用视野增强FoVE(Field-of-View Enhancement)方法减轻边缘现象(切片边缘附近的建筑物提取精度通常低于中心区域附近的精度)的影响,并分别在两个建筑物提取标准数据集上的实验表明,编解码卷积神经网络能有效实现像素级建筑物提取,FoVE能有效提高建筑物提取准确率;通过改变预测时切片大小和重叠度,分析其对建筑物提取结果的影响,揭示了FoVE的饱和性。
Building extraction plays a significant role in land use analysis like urban planning. Classical methods based on hand-crafted features fail to derive prominent building extraction results due to the limited representation capacity of the hand-crafted features. In this paper
we achieve building extraction in pixel level based on a deep Convolutional Neural Network (CNN) with an encoder-decoder structure. In contrast to the hand-crafted features that require professional knowledge and have a poor representation capacity
convolutional neural networks are equipped with a high representation capacity and able to learn highly abstract and distinguishing features from data. The encoder is used to derived a space compressed representation of the input raw image. This compressed representation is also called a feature of the input image and it is assumed to be abstract and distinguishing. The decoder uses the feature as input and recover the space resolution to the size of the input image. Thereby
the encoder-decoder network achieves pixel-wise building extraction in an end-to-end way from the raw image to the building extraction result.
Applying the encoder-decoder network to building extraction will cause a Marginal Phenomenon (MP). Specifically
the prediction accuracy near the edges of a patch is usually lower than that near the central area. Marginal phenomenon will lead to the reduction of building extraction accuracy. To alleviate this effect
we propose the usage of the Field of View Enhancement (FoVE) method. The FoVE method includes two parts: enlarging the patch size and cropping patches with overlaps when making predictions. Therefore
the FoVE method contains two hyper-parameters
which are patch size and overlapping size. Extensive experiments on two building extraction datasets are conducted to analyze the impact of the two hyper-parameters through the Precision-Recall Curves (PRC) and some interesting conclusions are derived from the the analysis: (1) Enlarging the input patch size when making prediction can effectively improve the building extraction performance while the improvement saturates as the overlapping size increases; (2) Cropping patches with an overlap when making prediction can improve the building extraction performance while the improvement saturates as the input patch size increases; (3) The FoVE can effectively improve building extraction accuracy but this improvement from the FoVE has a limit; (4) The convolutional neural network for building extraction plays the key role and further attentions should be focused on the network design.
In addition to the numerical analysis of the FoVE experimental results
we attempt to explain why FoVE works and why it has a limit. We blame them on the Field of View (FoV) and that is reason why the method is call FoVE. FoV plays an important role in building extraction and a larger FoV is beneficial to building extraction. Firstly
the marginal phenomenon is caused by the lack of context information of the marginal pixels. FoVE improves the overall accuracy through abandoning the unreliable predictions of the marginal pixels. Secondly
enlarging input patches can enlarge the FoV of each pixel and thus improves the accuracy. Thirdly
the the improvement from FoVE has a limit because that when the field of view is large enough
the improvement derived from more contextual information can be ignore.
遥感建筑物提取卷积神经网络深度学习航空影像
remote sensingbuilding extractionconvolutional neural networkdeep learningaerial imagery
Chen K Q, Fu K, Yan M L, Gao X, Sun X and Wei X. 2018a. Semantic segmentation of aerial images with shuffling convolutional neural networks. IEEE Geoscience and Remote Sensing Letters, 15(2): 173-177 [DOI: 10.1109/LGRS.2017.2778181http://dx.doi.org/10.1109/LGRS.2017.2778181]
Chen K Q, Weinmann M, Gao X, Yan M L, Hinz S, Jutzi B and Weinmann M, 2018b. Residual shuffling convolutional neural networks for deep semantic image segmentation using multi-modal data. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4(2): 65-72 [DOI: 10.5194/isprs-annals-IV-2-65-2018http://dx.doi.org/10.5194/isprs-annals-IV-2-65-2018]
Chen T Q, Li M, Li Y T, Lin M, Wang N Y, Wang M J, Xiao T J, Xu B, Zhang C Y and Zhang Z. 2015. Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274
Decatur S E. 1989. Application of neural networks to terrain classification//1989 International Joint Conference on Neural Networks. Washing: IEEE: 283-288 [DOI: 10.1109/IJCNN.1989.118592http://dx.doi.org/10.1109/IJCNN.1989.118592]
Grabner H, Nguyen T T, Gruber B and Bischof H. 2008. On-line boosting-based car detection from aerial images. ISPRS Journal of Photogrammetry and Remote Sensing, 63(3): 382-396 [DOI: 10.1016/j.isprsjprs.2007.10.005http://dx.doi.org/10.1016/j.isprsjprs.2007.10.005]
Haralick R M, Shanmugam K and Dinstein I H. 1973. Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, SMC-3(6): 610-621[DOI: 10.1109/TSMC.1973.4309314http://dx.doi.org/10.1109/TSMC.1973.4309314]
He K M, Zhang X Y, Ren S Q and Sun J. 2015. Delving deep into rectifiers: surpassing human-level performance on imagenet classification//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1026-1034 [DOI: 10.1109/ICCV.2015.123http://dx.doi.org/10.1109/ICCV.2015.123]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Huang C, Davis L S and Townshend J R G. 2002. An assessment of support vector machines for land cover classification. International Journal of Remote Sensing, 23(4): 725-749 [DOI: 10.1080/01431160110040323http://dx.doi.org/10.1080/01431160110040323]
Krizhevsky A, Sutskever I and Hinton G E. 2012. Imagenet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY: ACM: 1097-1105
Lee J, Weger R C, Sengupta S K and Welch R M. 1990. A neural network approach to cloud classification. IEEE Transactions on Geoscience and Remote Sensing, 28(5): 846-855 [DOI: 10.1109/36.58972http://dx.doi.org/10.1109/36.58972]
Mnih V. 2013. Machine Learning for Aerial Image Labeling. Toronto: University of Toronto
Paisitkriangkrai S, Sherrah J, Janney P and van Den Hengel A. 2015. Effective semantic pixel labelling with convolutional networks and conditional random fields//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, MA, USA: IEEE: 36-43 [DOI: 10.1109/CVPRW.2015.7301381http://dx.doi.org/10.1109/CVPRW.2015.7301381]
Paola J D and Schowengerdt R A. 1995. A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification. IEEE Transactions on Geoscience and Remote Sensing, 33(4): 981-996 [DOI: 10.1109/36.406684http://dx.doi.org/10.1109/36.406684]
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S A, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211-252 [DOI: 10.1007/s11263-015-0816-yhttp://dx.doi.org/10.1007/s11263-015-0816-y]
Saito S and Aoki Y. 2015. Building and road detection from large aerial imagery//Proceedings of SPIE 9405, Image Processing: Machine Vision Applications VIII. San Francisco: SPIE [DOI: 10.1117/12.2083273http://dx.doi.org/10.1117/12.2083273]
Saito S, Yamashita T and Aoki Y. 2016. Multiple object extraction from aerial imagery with convolutional neural networks. Journal of Imaging Science and Technology, 60(1): 010402 [DOI: 10.2352/J.ImagingSci.Technol.2016.60.1.010402http://dx.doi.org/10.2352/J.ImagingSci.Technol.2016.60.1.010402]
Sherrah J. 2016. Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery. arXiv:1606.02585
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Song X M, Fan G L and Rao M. 2005. Automatic CRP mapping using nonparametric machine learning approaches. IEEE Transactions on Geoscience and Remote Sensing, 43(4): 888-897[DOI: 10.1109/TGRS.2005.844031http://dx.doi.org/10.1109/TGRS.2005.844031]
Srivastava R K, Greff K and Schmidhuber J. 2015. Highway networks. arXiv:1505.00387
Volpi M and Tuia D. 2017. Dense semantic labeling of subdecimeter resolution images with convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 55(2): 881-893 [DOI: 10.1109/TGRS.2016.2616585http://dx.doi.org/10.1109/TGRS.2016.2616585]
Zeiler M D, Krishnan D, Taylor G W and Fergus R. 2010. Deconvolutional networks//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE: 2528-2535 [DOI: 10.1109/CVPR.2010.5539957http://dx.doi.org/10.1109/CVPR.2010.5539957]
Zhang H, Lei H and Chen K Q. 2018. Application of densely connected deconvolutional neural network in building extraction from remote sensing images. Computer Engineering and Applications, 54(11): 140-144, 152
张欢, 雷宏, 陈凯强. 2018. 密集反卷积网络在遥感建筑物提取中的应用. 计算机工程与应用, 54(11): 140-144, 152 [DOI: 10.3778/j.issn.1002-8331.1701-0316http://dx.doi.org/10.3778/j.issn.1002-8331.1701-0316]
相关作者
相关机构