Multi-information supervision in optical remote sensing images

WANG Jiabao; CHENG Gong; XIE Xingxing; YAO Yanqing; HAN Junwei

doi:10.11834/jrs.20211564

Artificial Intelligence (AI) for Remote Sensing | Views : 0 下载量: 227 CSCD: 0 更多指标

R-PDF
PDF
Export
Share
Collection
Album

Multi-information supervision in optical remote sensing images
Vol. 27, Issue 12, Pages: 2726-2735(2023)
Published： 07 December 2023 ，
DOI： 10.11834/jrs.20211564

扫描看全文

王家宝，程塨，谢星星，姚艳清，韩军伟.2023.多元信息监督的遥感图像有向目标检测.遥感学报，27（12）： 2726-2735

Wang J B，Cheng G，Xie X X，Yao Y Q and Han J W. 2023. Multi-information supervision in optical remote sensing images. National Remote Sensing Bulletin， 27（12）：2726-2735
王家宝，程塨，谢星星，姚艳清，韩军伟.2023.多元信息监督的遥感图像有向目标检测.遥感学报，27（12）： 2726-2735 DOI： 10.11834/jrs.20211564.

Wang J B，Cheng G，Xie X X，Yao Y Q and Han J W. 2023. Multi-information supervision in optical remote sensing images. National Remote Sensing Bulletin， 27（12）：2726-2735 DOI： 10.11834/jrs.20211564.

摘要

遥感图像有向目标检测是遥感图像解译中的一项基础任务，在许多领域有着广泛的应用。由于遥感图像目标尺度差异性大、方向任意且紧密排列，传统目标检测所使用的水平框无法准确的定位目标。因此，遥感图像有向目标检测成为目前遥感领域的研究热点。受益于深度学习的发展，遥感图像有向目标检测取得了突破性进展，但是大多数方法仅在检测头部加入角度预测参数，在训练过程中没有充分利用角度信息和语义信息。本文提出了一种多元信息监督的遥感图像有向目标检测方法。首先，在感兴趣区域提取阶段利用角度信息监督网络学习目标方向，从而使网络第一阶段生成更加贴近遥感图像目标的有向候选区域。其次，为了充分利用图像语义信息，本文在网络第二阶段增加语义分支，并使用图像语义标签进行监督学习。本文以Faster R-CNN OBB为基准，在DOTA数据集上验证所提方法的有效性。本文方法相比基准，平均精度（mAP）提升了2.8%，最终的检测精度（mAP）达到74.6%。

Abstract

Oriented object detection is a basic task in the interpretation of high-resolution remote sensing images. Compared with general detectors

oriented detectors can locate instances with oriented bounding boxes

which are consistent with arbitrary-oriented ground truths in remote sensing images. Currently

oriented object detection has greatly progressed with the development of the convolutional neural network. However

this task is still challenging because of the extreme variation in object scales and arbitrary orientations. Most oriented detectors are evolved from horizontal detectors. They first generate horizontal proposals using the Region Proposal Network (RPN). Then

they classify these proposals into different categories and transform them into oriented bounding boxes. Despite their success

these detectors exploit only the annotations at the end of the network and do not fully utilize the angle and semantic information.

This work proposes an Angle-based Region Proposal Network (ARPN)

which learns the angle of objects and generates oriented proposals. The structure of ARPN is the same as that of RPN. However

for each proposal

instead of outputting four parameters for regression

ARPN generates five parameters

which are the center (

)

shape (

)

and angle (

). In the training

we first assign anchors with ground truths by the Intersection of Unions. Then

we directly supervise the ARPN with the shape and angle information of ground truths. We also propose a semantic branch to output image semantic results for utilizing the advantage of the semantic information. The semantic branch consists of two convolutional layers and is parallel with the detection head. We first assign objects to different scale levels according to their areas. Then

we create semantic labels in each scale and use them to supervise the semantic branch. With the semantic information supervision

the model will learn translation-variant features and improve accuracy. Moreover

the outputs of the semantic branch indicate the objectness in each place

which can filter out false positives of final predictions.

We conduct comprehensive experiments on the DOTA dataset to validate the effectiveness of the proposed methods. In the data preparation

we first crop original images into 1024×1024 patches with the stride of 824. Compared with the baseline

the ARPN achieves a 2.2% increase in mAP

while the semantic branch contributes an additional 0.8% improvement in mAP. Finally

we combine both methods and achieve a 74.64% mAP

which is competitive with those obtained by other oriented object detectors. We visualize some results on the DOTA dataset. The results show that our method is highly effective for small objects and densely packed objects.

We proposed ARPN and the semantic branch to utilize the multi-information in remote sensing images. The ARPN can directly generate oriented proposals

which can lead to better recall of oriented objects. The semantic branch increases the translation-variant property of the features. Experiments demonstrate the effectiveness of our method

which achieves a 74.64% mAP on the DOTA dataset. In the future works

we will focus on the model efficiency and the inference speed.

关键词

目标检测有向目标检测区域建议提取多元信息遥感图像

Keywords

object detectionoriented object detectionregion proposal generationmulti-informationremote sensing images

references

Cao Q, Ma A L, Zhong Y F, Zhao J, Zhao B and Zhang L P. 2019. Urban classification by multi-feature fusion of hyperspectral image and LiDAR data. Journal of Remote Sensing, 23(5): 892-903

曹琼, 马爱龙, 钟燕飞, 赵济, 赵贝, 张良培. 2019. 高光谱-LiDAR多级融合城区地表覆盖分类. 遥感学报, 23(5): 892-903 [DOI: 10.11834/jrs.20197512http://dx.doi.org/10.11834/jrs.20197512]

Chen K Q, Gao X, Yan M L, Zhang Y and Sun X. 2020. Building extraction in pixel level from aerial imagery with a deep encoder-decoder network. Journal of Remote Sensing (Chinese), 24(9): 1134-1142

陈凯强, 高鑫, 闫梦龙, 张跃, 孙显. 2020. 基于编解码网络的航空影像像素级建筑物提取. 遥感学报, 24(9): 1134-1142 [DOI: 10.11834/jrs.20209056http://dx.doi.org/10.11834/jrs.20209056]

Cheng G, Zhou P C and Han J W. 2016. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 54(12): 7405-7415 [DOI: 10.1109/tgrs.2016.2601622http://dx.doi.org/10.1109/tgrs.2016.2601622]

Ding J, Xue N, Long Y, Xia G S and Lu Q K. 2019. Learning RoI transformer for oriented object detection in aerial images//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE: 2844-2853 [DOI: 10.1109/cvpr.2019.00296http://dx.doi.org/10.1109/cvpr.2019.00296]

Girshick R. 2015. Fast R-CNN//2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE: 1440-1448 [DOI: 10.1109/iccv.2015.169http://dx.doi.org/10.1109/iccv.2015.169]

Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE: 580-587 [DOI: 10.1109/cvpr.2014.81http://dx.doi.org/10.1109/cvpr.2014.81]

Gong J Y and Zhong Y F. 2016. Survey of intelligent optical remote sensing image processing. Journal of Remote Sensing, 20(5): 733-747

龚健雅, 钟燕飞. 2016. 光学遥感影像智能化处理研究进展. 遥感学报, 20(5): 733-747 [DOI: 10.11834/jrs.20166205http://dx.doi.org/10.11834/jrs.20166205]

Han J M, Ding J, Li J and Xia G S. 2022. Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60: 5602511 [DOI: 10.1109/tgrs.2021.3062048http://dx.doi.org/10.1109/tgrs.2021.3062048]

He K M, Gkioxari G, Dollar P and Girshick R. 2020. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 386-397 [DOI: 10.1109/tpami.2018.2844175http://dx.doi.org/10.1109/tpami.2018.2844175]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE: 770-778 [DOI: 10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90]

Huang Z C, Li W, Xia X G, Wang H, Jie F R and Tao R. 2022a. LO-Det: lightweight oriented object detection in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60: 5603515 [DOI: 10.1109/tgrs.2021.3067470http://dx.doi.org/10.1109/tgrs.2021.3067470]

Huang Z C, Li W, Xia X G, Wu X, Cai Z Q and Tao R. 2022b. A novel nonlocal-aware pyramid and multiscale multitask refinement detector for object detection in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60: 5601920 [DOI: 10.1109/tgrs.2021.3059450http://dx.doi.org/10.1109/tgrs.2021.3059450]

Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on Machine Learning. Lille: JMLR.org: 448-456

Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90 [DOI: 10.1145/3065386http://dx.doi.org/10.1145/3065386]

Lin T Y, Goyal P, Girshick R, He K M and Dollar P. 2020. Focal loss for dense object detection. IEEE Trans actionson Pattern Analysis and Machine Intelligence, 42(2): 318-327 [DOI: 10.1109/TPAMI.2018.2858826http://dx.doi.org/10.1109/TPAMI.2018.2858826]

Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//13th European Conference on Computer Vision. Zurich: Springer: 740-755 [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]

Ma J Q, Shao W Y, Ye H, Wang L, Wang H, Zheng Y B and Xue X Y. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11): 3111-3122 [DOI: 10.1109/tmm.2018.2818020http://dx.doi.org/10.1109/tmm.2018.2818020]

Ming Q, Zhou Z Q, Miao L J, Zhang H W and Li L H. 2020. Dynamic anchor learning for arbitrary-oriented object detection. arXiv preprint arXiv: 2012.04150

Pang J M, Chen K, Shi J P, Feng H J, Ouyang W L and Lin D H. 2019. Libra R-CNN: towards balanced learning for object detection//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE: 821-830 [DOI: 10.1109/cvpr.2019.00091http://dx.doi.org/10.1109/cvpr.2019.00091]

Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE: 779-788 [DOI: 10.1109/cvpr.2016.91http://dx.doi.org/10.1109/cvpr.2016.91]

Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE: 6517-6525 [DOI: 10.1109/cvpr.2017.690http://dx.doi.org/10.1109/cvpr.2017.690]

Redmon J and Farhadi A. 2018. YOLOv3: an incremental improvement. arXiv preprint arXiv: 1804.02767

Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/tpami.2016.2577031http://dx.doi.org/10.1109/tpami.2016.2577031]

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

Song G L, Liu Y and Wang X G. 2020. Revisiting the sibling head in object detector//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE: 11560-11569 [DOI: 10.1109/cvpr42600.2020.01158http://dx.doi.org/10.1109/cvpr42600.2020.01158]

Sun X, Liang W, Diao W H, Cao Z Y, Feng Y C, Wang B and Fu K. 2020. Progress and challenges of remote sensing edge intelligence technology. Journal of Image and Graphics, 25(9): 1719-1738

孙显, 梁伟, 刁文辉, 曹志颖, 冯瑛超, 王冰, 付琨. 2020. 遥感边缘智能技术研究进展及挑战. 中国图象图形学报, 25(9): 1719-1738 [DOI: 10.11834/jig.200288http://dx.doi.org/10.11834/jig.200288]

Szegedy C, Ioffe S, Vanhoucke V and Alemi A A. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning//Proceedings of the Thirty-first AAAI Conference on Artificial Intelligence. San Francisco: AAAI Press: 4278-4284

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE: 1-9 [DOI: 10.1109/cvpr.2015.7298594http://dx.doi.org/10.1109/cvpr.2015.7298594]

Szegedy C, Vanhoucke V, Ioffe S, Shlens J and Wojna Z. 2016. Rethinking the inception architecture for computer vision//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE: 2818-2826 [DOI: 10.1109/cvpr.2016.308http://dx.doi.org/10.1109/cvpr.2016.308]

Wang J Q, Chen K, Yang S, Loy C C and Lin D H. 2019. Region proposal by guided anchoring//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE: 2960-2969 [DOI: 10.1109/cvpr.2019.00308http://dx.doi.org/10.1109/cvpr.2019.00308]

Wu Y, Chen Y P, Yuan L, Liu Z C, Wang L J, Li H Z and Fu Y. 2020. Rethinking classification and localization for object detection//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE: 10183-10192 [DOI: 10.1109/cvpr42600.2020.01020http://dx.doi.org/10.1109/cvpr42600.2020.01020]

Xia G S, Bai X, Ding J, Zhu Z, Belongie S, Luo J B, Datcu M, Pelillo M and Zhang L P. 2018. DOTA: a large-scale dataset for object detection in aerial images//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 3974-3983 [DOI: 10.1109/cvpr.2018.00418http://dx.doi.org/10.1109/cvpr.2018.00418]

Yang X, Yan J C, Feng Z M and He T. 2021. R3Det: refined single-stage detector with feature refinement for rotating object. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4): 3163-3171 [DOI: 10.1609/aaai.v35i4.16426http://dx.doi.org/10.1609/aaai.v35i4.16426]

Yang X, Yang J R, Yan J C, Zhang Y, Zhang T F, Guo Z, Sun X and Fu K. 2019. SCRDet: towards more robust detection for small, cluttered and rotated objects//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul: IEEE: 8231-8240 [DOI: 10.1109/iccv.2019.00832http://dx.doi.org/10.1109/iccv.2019.00832]

Yao H G, Wang C, Yu J, Bai X J and Li W. 2020. Recognition of small-target ships in complex satellite images. Journal of Remote Sensing (Chinese), 24(2): 116-125

姚红革, 王诚, 喻钧, 白小军, 李蔚. 2020. 复杂卫星图像中的小目标船舶识别. 遥感学报, 24(2): 116-125 [DOI: 10.11834/jrs.20208238http://dx.doi.org/10.11834/jrs.20208238]

Yao Y Q, Cheng G, Xie X X and Han J W. 2021. Optical remote sensing image object detection based on multi-resolution feature fusion. National Remote Sensing Bulletin, 25(5): 1124-1137

姚艳清, 程塨, 谢星星, 韩军伟. 2021. 多分辨率特征融合的光学遥感图像目标检测. 遥感学报, 25(5): 1124-1137 [DOI: 10.11834/jrs.20210505http://dx.doi.org/10.11834/jrs.20210505]

Zhou P C, Cheng G, Yao X W and Han J W. 2021. Machine learning paradigms in high-resolution remote sensing image interpretation. National Remote Sensing Bulletin, 25(1): 182-197

周培诚, 程塨, 姚西文, 韩军伟. 2021. 高分辨率遥感影像解译中的机器学习范式. 遥感学报, 25(1): 182-197 [DOI: 10.11834/jrs.20210164http://dx.doi.org/10.11834/jrs.20210164]

Alert me when the article has been cited

提交

Optical remote sensing image object detection based on multi-resolution feature fusion

SAR ship detection through generative knowledge transfer

A comprehensive review of optical remote-sensing image object detection datasets

MAR20： A benchmark for military aircraft recognition in remote sensing images