多分辨率特征融合的光学遥感图像目标检测
Optical remote sensing image object detection based on multi-resolution feature fusion
- 2021年25卷第5期 页码:1124-1137
纸质出版日期: 2021-05-07
DOI: 10.11834/jrs.20210505
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2021-05-07 ,
扫 描 看 全 文
姚艳清,程塨,谢星星,韩军伟.2021.多分辨率特征融合的光学遥感图像目标检测.遥感学报,25(5): 1124-1137
Yao Y Q,Cheng G,Xie X X and Han J W. 2021. Optical remote sensing image object detection based on multi-resolution feature fusion. National Remote Sensing Bulletin, 25(5):1124-1137
高分辨率遥感图像目标检测是计算机视觉的一个重要研究领域,在民用与军事领域具有重要的应用价值。目前,基于深度学习的自然图像目标检测有了突破性进展。但是,由于遥感图像具有目标尺度差异大且类间相似度高的特点,使得处理自然图像的目标检测算法直接应用于遥感图像时仍面临着一些挑战。针对上述挑战,本文提出一种多分辨率特征融合的遥感图像目标检测方法。首先,通过特征金字塔提取多尺度特征图并在其后嵌入多分辨率特征提取网络,促使网络学习目标在不同分辨率下的特征,缩小不同特征层之间的语义差距。其次,为实现多分辨特征的有效融合,本文采用自适应特征融合模块挖掘更具判别性的多分辨特征表达。最后,将自适应特征融合模块的输出特征的相邻层进行深度融合。在公开的遥感图像目标检测数据集DIOR和DOTA上评估了本文方法的有效性,相比采用特征金字塔结构的Faster R-CNN,本文方法的准确率(mAP)分别提高2.5%和2.2%。
In recent years
high-resolution remote sensing image object detection has attracted increasing interest and become an important research field of computer vision due to its wide applications in civil and military fields
such as environmental monitoring
urban planning
precision agriculture
and land mapping. The natural scene object detection frameworks based on deep learning have made a breakthrough progress. These algorithms have good detection performance on the open data sets of natural scenes. However
although these algorithms have greatly improved the accuracy and speed of remote sensing image object detection
they have not achieved the expected results. Given the large variations of object sizes and inter-class similarity
most of the conventional object detection algorithms designed for natural scene images still face some challenges when directly applied to remote sensing images. To address the above challenges
we propose an end-to-end multi-resolution feature fusion framework for object detection in remote sensing images
which can effectively improve the object detection accuracy. Specifically
we use a Feature Pyramid Network (FPN) to extract multi-scale feature maps. Then
a Multi-resolution Feature Extract (MFE) module
which can promote the network to learn the feature representations of the objects at different resolutions and narrow the semantic gap between different scales
is inserted into the feature layers of different scales. Next
to achieve an effective fusion of multi-resolution features
we use an Adaptive Feature Fusion (AFF) module to obtain more discriminative multi-resolution feature representations. Finally
we use a Dual-scale Feature Deep Fusion (DFDF) module to fuse two adjacent-scale features
which are the output of the adaptive feature fusion module. In the experiments
to demonstrate the effectiveness of each module of our proposed method
including the MFE
AFF
and DFDF modules
we first conducted extensive ablation studies on the large-scale remote sensing image data set DIOR
and the experimental results show that our proposed MFE
AFF
and DFDF modules could improve the average detection accuracy by 1.4%
0.5%
and 1.3%
respectively
compared with the baseline method. Furthermore
we evaluate our method on two publicly available remote sensing image object detection data sets
namely
DIOR and DOTA
and obtain improvements of 2.5% and 2.2%
respectively
which are measured in terms of mAP comparison with Faster R-CNN with FPN. The detection results of the ablation studies and the comparison experiments indicate that our method can extract more discriminative and powerful feature representations than Faster R-CNN with FPN
which can significantly boost the detection accuracy. Moreover
our method works well for densely arranged and multi-scale objects. Although many improvements have been achieved in this work
some aspects still require improvement. For example
our method performs poorly in terms of detecting objects with big aspect-ratios
such as bridges
possibly because most anchor-based methods have difficulty ensuring a sufficiently high intersection over union rate with the ground-truth objects with big aspect-ratios. Our future work will focus on addressing these problems by exploring the advantages of anchor-free based methods.
卷积神经网络多分辨率特征融合遥感图像目标检测
convolutional neural networksmulti-resolution feature fusionremote sensing imagesobject detection
Azimi S M, Vig E, Bahmanyar R, Körner M and Reinartz P. 2019. Towards multi-class object detection in unconstrained remote sensing imagery//14th Asian Conference on Computer Vision. Perth, Australia: Springer: 150-165 [DOI: 10.1007/978-3-030-20893-6_10http://dx.doi.org/10.1007/978-3-030-20893-6_10]
Cao Q, Ma A L, Zhong Y F, Zhao J, Zhao B and Zhang L P. 2019. Urban classification by multi-feature fusion of hyperspectral image and LiDAR data. Journal of Remote Sensing, 23(5): 892-903
曹琼, 马爱龙, 钟燕飞, 赵济, 赵贝, 张良培. 2019. 高光谱-LiDAR多级融合城区地表覆盖分类. 遥感学报, 23(5): 892-903 [DOI: 10.11834/jrs.20197512http://dx.doi.org/10.11834/jrs.20197512]
Chen C Y, Gong W G, Chen Y L and Li W H. 2019. Object detection in remote sensing images based on a scene-contextual feature pyramid network. Remote Sensing, 11(3): 339 [DOI: 10.3390/rs11030339http://dx.doi.org/10.3390/rs11030339]
Chen K Q, Gao X, Yan M L, Zhang Y and Sun X. 2020. Building extraction in pixel level from aerial imagery with a deep encoder-decoder network. Journal of Remote Sensing, 24(9): 1134-1142
陈凯强, 高鑫, 闫梦龙, 张跃, 孙显. 2020. 基于编解码网络的航空影像像素级建筑物提取. 遥感学报, 24(9): 1134-1142 [DOI: 10.11834/jrs.20209056http://dx.doi.org/10.11834/jrs.20209056]
Cheng G and Han J W. 2016. A survey on object detection in optical remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 117: 11-28 [DOI: 10.1016/j.isprsjprs.2016.03.014http://dx.doi.org/10.1016/j.isprsjprs.2016.03.014]
Cheng G, Han J W, Zhou P C and Xu D. 2019. Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Transactions on Image Processing, 28(1): 265-278 [DOI: 10.1109/TIP.2018.2867198http://dx.doi.org/10.1109/TIP.2018.2867198]
Cheng G, Li Z P, Han J W, Yao X W and Guo L. 2018. Exploring hierarchical convolutional features for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 56(11): 6712-6722 [DOI: 10.1109/TGRS.2018.2841823http://dx.doi.org/10.1109/TGRS.2018.2841823]
Cheng G, Si Y, Hong H L, Yao X W and Guo L. 2020. Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geoscience and Remote Sensing Letters, 18(3): 431-435 [DOI: 10.1109/LGRS.2020.2975541http://dx.doi.org/10.1109/LGRS.2020.2975541]
Cheng G, Zhou P C and Han J W. 2016. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 54(12): 7405-7415 [DOI: 10.1109/TGRS.2016.2601622http://dx.doi.org/10.1109/TGRS.2016.2601622]
Dai J F, Li Y, He K M and Sun J. 2016. R-FCN: object detection via region-based fully convolutional networks//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc.: 379-387 [DOI: 10.5555/3157096.3157139http://dx.doi.org/10.5555/3157096.3157139]
Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 1: 886-893 [DOI: 10.1109/CVPR.2005.177http://dx.doi.org/10.1109/CVPR.2005.177]
Girshick R. 2015. Fast R-CNN//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1440-1448 [DOI: 10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169]
Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 1: 580-587 [DOI: 10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81]
Gong J Y and Zhong Y F. 2016. Survey of intelligent optical remote sensing image processing. Journal of Remote Sensing, 20(5): 733-747
龚健雅, 钟燕飞. 2016. 光学遥感影像智能化处理研究进展. 遥感学报, 20(5): 733-747 [DOI: 10.11834/jrs.20166205http://dx.doi.org/10.11834/jrs.20166205]
Hamaguchi R and Hikosaka S. 2018. Building detection from satellite imagery using ensemble of size-specific detectors//IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, UT, USA: IEEE: 223-227 [DOI: 10.1109/CVPRW.2018.00041http://dx.doi.org/10.1109/CVPRW.2018.00041]
He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988 [DOI: 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Li K, Cheng G, Bu S H and You X. 2018a. Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 56(4): 2337-2348 [DOI: 10.1109/TGRS.2017.2778300http://dx.doi.org/10.1109/TGRS.2017.2778300]
Li K, Wan G, Cheng G, Meng L Q and Han J W. 2020. Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 159: 296-307 [DOI: 10.1016/j.isprsjprs.2019.11.023http://dx.doi.org/10.1016/j.isprsjprs.2019.11.023]
Li Q P, Mou L C, Liu Q J, Wang Y H and Zhu X X. 2018b. HSF-Net: multiscale deep feature embedding for ship detection in optical remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 56(12): 7147-7161 [DOI: 10.1109/TGRS.2018.2848901http://dx.doi.org/10.1109/TGRS.2018.2848901]
Lin T Y, Dollár P, Girshick R, He K M, Hariharan B and Belongie S. 2017a. Feature pyramid networks for object detection//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 936-944. [DOI: 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]
Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017b. Focal loss for dense object detection//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007 [DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]
Liu S, Qi L, Qin H F, Shi J P and Jia J Y. 2018. Path aggregation network for instance segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 8759-8768 [DOI: 10.1109/CVPR.2018.00913http://dx.doi.org/10.1109/CVPR.2018.00913]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot multibox detector//14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 21-37 [DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]
Long Y, Gong Y P, Xiao Z F and Liu Q. 2017. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 55(5): 2486-2498 [DOI: 10.1109/TGRS.2016.2645610http://dx.doi.org/10.1109/TGRS.2016.2645610]
Lowe D G. 1999. Object recognition from local scale-invariant features//Proceedings of the Seventh IEEE International Conference on Computer Vision. Kerkyra, Greece: IEEE, 2: 1150-1157 [DOI: 10.1109/iccv.1999.790410http://dx.doi.org/10.1109/iccv.1999.790410]
Ma W P, Guo Q Q, Wu Y, Zhao W, Zhang X R and Jiao L C. 2019. A novel multi-model decision fusion network for object detection in remote sensing images. Remote Sensing, 11(7): 737 [DOI: 10.3390/rs11070737http://dx.doi.org/10.3390/rs11070737]
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once : unified, real-time object detection//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 779-788 [DOI: 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]
Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 6517-6525 [DOI: 10.1109/CVPR.2017.690http://dx.doi.org/10.1109/CVPR.2017.690]
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031]
Ren Y, Zhu C R and Xiao S P. 2018. Deformable faster R-CNN with aggregating multi-layer features for partially occluded object detection in optical remote sensing images. Remote Sensing, 10(9): 1470 [DOI: 10.3390/rs10091470http://dx.doi.org/10.3390/rs10091470]
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211-252 [DOI: 10.1007/s11263-015-0816-yhttp://dx.doi.org/10.1007/s11263-015-0816-y]
Sun X, Liang W, Diao W H, Cao Z Y, Feng Y C, Wang B and Fu K. 2020. Progress and challenges of remote sensing edge intelligence technology. Journal of Image and Graphics, 25(9): 1719-1738
孙显, 梁伟, 刁文辉, 曹志颖, 冯瑛超, 王冰, 付琨. 2020. 遥感边缘智能技术研究进展及挑战. 中国图象图形学报, 25(9): 1719-1738 [DOI: 10.11834/jig.200288http://dx.doi.org/10.11834/jig.200288]
Wang P J, Sun X, Diao W H and Fu K. 2020. FMSSD: feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 58(5): 3377-3390 [DOI: 10.1109/TGRS.2019.2954328http://dx.doi.org/10.1109/TGRS.2019.2954328]
Xia G S, Bai X, Ding J, Zhu Z, Belongie S, Luo J B, Datcu M, Pelillo M and Zhang L P. 2018. DOTA: a large-scale dataset for object detection in aerial images//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 3974-3983 [DOI: 10.1109/CVPR.2018.00418http://dx.doi.org/10.1109/CVPR.2018.00418]
Yang X, Yang J R, Yan J C, Zhang Y, Zhang T F, Guo Z, Sun X and Fu K. 2019. SCRDet: towards more robust detection for small, cluttered and rotated objects//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 8231-8240 [DOI: 10.1109/ICCV.2019.00832http://dx.doi.org/10.1109/ICCV.2019.00832]
Yao H G, Wang C, Yu J, Bai X J and Li W. 2020. Recognition of small-target ships in complex satellite images. Journal of Remote Sensing, 24(2): 116-125
姚红革, 王诚, 喻钧, 白小军, 李蔚. 2020. 复杂卫星图像中的小目标船舶识别. 遥感学报, 24(2) 116-125 [DOI: 10.11834/jrs.20208238]
Zhou P C, Cheng G, Yao X W and Han J W. 2021. Machine learning paradigms in high-resolution remote sensing image interpretation. Journal of Remote Sensing, 25(1): 182-197
周培诚, 程塨, 姚西文, 韩军伟. 2021. 高分辨率遥感影像解译中的机器学习范式. 遥感学报, 25(1): 182-197 [DOI: 10.11834/jrs.20210164http://dx.doi.org/10.11834/jrs.20210164]
Zhou P C, Han J W, Cheng G and Zhang B C. 2019. Learning compact and discriminative stacked autoencoder for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 57(7): 4823-4833 [DOI: 10.1109/TGRS.2019.2893180http://dx.doi.org/10.1109/TGRS.2019.2893180]
相关作者
相关机构