改进CenterNet在遥感图像目标检测中的应用
Application of an improved CenterNet in remote sensing images object detection
- 2023年27卷第12期 页码:2706-2715
收稿:2021-10-22,
纸质出版:2023-12-07
DOI: 10.11834/jrs.20231638
移动端阅览
收稿:2021-10-22,
纸质出版:2023-12-07
移动端阅览
为了提高遥感图像目标检测的效率及精度,本文提出了一种基于改进CenterNet的遥感图像目标检测方法。基于CenterNet的检测框架,该方法能够降低目标检测所需要的步骤,减少对锚框的依赖。而在CenterNet的基础上,所提方法通过采用带有转置卷积的ResNet作为骨干网络,降低了骨干网络的参数数量;然后针对训练用的热力图标签,提出了针对中心点设计的高斯核适用范围边长的计算方法;最后利用注意力机制,提高所提取特征中目标区域特征的有效性。在公开的高分辨率遥感图像上的实验结果表明,3种改进措施将目标检测的精度提高了4.0%,与此同时所需的检测时间降低为原来的31.9%。与其他对比方法相比,所提方法在精度和速度上均有一定的优势,表明所提方法在遥感图像目标检测中具有一定的实用性。
Nowadays
object detection methods based on deep learning are widely used in the interpretation of remote sensing images. The anchor-based methods usually need to design the anchor boxes first
which requires more detection steps and time cost. This study proposed an object detection method of remote sensing images based on the improved CenterNet. The method can simplify the object detection process and improve efficiency.
The CenterNet uses a fully convolutional network to directly predict the heat map of the center points
widths
and heights of the corresponding objects
and the position offsets of the center points. The heat maps are used to generate the rough positions of the objects
and the offsets can fine-tune the positions to make them more accurate. The widths and heights further constitute the shape of the object boxes. The different heat maps decide the object categories. On the basis of CenterNet
the proposed method first adopts the ResNet with transposed convolution as the backbone network. The transposed convolution can expand the output feature maps
and ResNet can reduce the number of parameters in the backbone network compared with the Hourglass network. Second
the proposed method defines the length of Gaussian kernel under three limit conditions between the predicted and real boxes in CenterNet. The Gaussian kernel is applied to generate the heat map label
which is used for network training. Finally
the multi-head attention mechanism is introduced into the backbone network to learn the importance of each element in the feature maps. The weights assigned to the elements reflect their effectiveness
which makes the effective features concentrate in the regions of the object key points as much as possible.
The experiments use mean Average Precision (mAP) to evaluate the object detection results on the multiple categories. All the experiments are conducted at the DIOR dataset. The results show that the CenterNet using the ResNet with transposed convolution is 1.4% higher than that using the Hourglass. The proposed calculation of the length of the Gaussian kernel can increase the mAP by 1.1%. The addition of attention mechanism can further improve the mAP by 1.5%. At the same time
the proposed method reduces the time cost by 31.9% compared with the conventional method.The experimental results show that the proposed method can improve detection accuracy without sacrificing the detection speed. The ablation experiments of different parts also show that the ResNet with transposed convolution
the designed calculation method of the length of the Gaussian kernel
and the attention mechanism can effectively improve the mAP. The comparison with other methods also proves that the proposed method is practical.
Carion N , Massa F , Synnaeve G , Usunier N , Kirillov A and Zagoruyko S . 2020 . End-to-end object detection with transformers. arXiv: 2005 . 12872
Chen Q , Wang Y M , Yang T , Zhang X Y , Cheng J and Sun J . 2021 . You only look one-level feature // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville : IEEE: 13034 - 13043 [ DOI: 10.1109/CVPR46437.2021.01284 http://dx.doi.org/10.1109/CVPR46437.2021.01284 ]
Chen R , Liu Y , Zhang M D , Liu S , Yu B and Tai Y W . 2020 . Dive deeper into box for object detection // 16th European Conference on Computer Vision . Glasgow : Springer: 412 - 428 [ DOI: 10.1007/978-3-030-58542-6_25 http://dx.doi.org/10.1007/978-3-030-58542-6_25 ]
Cheng G and Han J W . 2016 . A survey on object detection in optical remote sensing images . ISPRS Journal of Photogrammetry and Remote Sensing , 117 : 11 - 28 [ DOI: 10.1016/j.isprsjprs.2016.03.014 http://dx.doi.org/10.1016/j.isprsjprs.2016.03.014 ]
Dai Z G , Cai B L , Lin Y G and Chen J Y . 2021 . UP-DETR: unsupervised pre-training for object detection with transformers // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville : IEEE: 1601 - 1610 [ DOI: 10.1109/CVPR46437.2021.00165 http://dx.doi.org/10.1109/CVPR46437.2021.00165 ]
Girshick R , Donahue J , Darrell T and Malik J . 2016 . Region-based convolutional networks for accurate object detection and segmentation . IEEE Transactions on Pattern Analysis and Machine Intelligence , 38 ( 1 ): 142 - 158 [ DOI: 10.1109/TPAMI.2015.2437384 http://dx.doi.org/10.1109/TPAMI.2015.2437384 ]
He K M , Zhang X Y , Ren S Q and Sun J . 2016 . Deep residual learning for image recognition // 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE: 770 - 778 [ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Hong M B , Li S W , Yang Y C , Zhu F Y , Zhao Q J and Lu L . 2022 . SSPNet: scale selection pyramid network for tiny person detection from UAV images . IEEE Geoscience and Remote Sensing Letters , 19 : 8018505 [ DOI: 10.1109/LGRS.2021.3103069 http://dx.doi.org/10.1109/LGRS.2021.3103069 ]
Kong T , Sun F C , Liu H P , Jiang Y N , Li L and Shi J B . 2020 . FoveaBox: beyound anchor-based object detection . IEEE Transactions on Image Processing , 29 : 7389 - 7398 [ DOI: 10.1109/TIP.2020.3002345 http://dx.doi.org/10.1109/TIP.2020.3002345 ]
Law H and Deng J . 2018 . CornerNet: detecting objects as paired keypoints // 15th European Conference on Computer Vision . Munich : Springer: 765 - 781 [ DOI: 10.1007/978-3-030-01264-9_45 http://dx.doi.org/10.1007/978-3-030-01264-9_45 ]
Li K , Wan G , Cheng G , Meng L Q and Han J W . 2020 . Object detection in optical remote sensing images: a survey and a new benchmark . ISPRS Journal of Photogrammetry and Remote Sensing , 159 : 296 - 307 [ DOI: 10.1016/j.isprsjprs.2019.11.023 http://dx.doi.org/10.1016/j.isprsjprs.2019.11.023 ]
Lin T Y , Dollár P , Girshick R , He K M , Hariharan B and Belongie S . 2017a . Feature pyramid networks for object detection // 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu : IEEE: 936 - 944 [ DOI: 10.1109/CVPR.2017.106 http://dx.doi.org/10.1109/CVPR.2017.106 ]
Lin T Y , Goyal P , Girshick R , He K M and Dolár P . 2017b . Focal loss for dense object detection // 2017 IEEE International Conference on Computer Vision . Venice : IEEE: 2999 - 3007 [ DOI: 10.1109/ICCV.2017.324 http://dx.doi.org/10.1109/ICCV.2017.324 ]
Liu W , Anguelov D , Erhan D , Szegedy C , Reed S , Fu C Y and Berg A C . 2016 . SSD: single shot MultiBox detector // 14 European Conference on Computer Vision . Amsterdam : Springer: 21 - 37 [ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]
Newell A , Yang K Y and Deng J . 2016 . Stacked hourglass networks for human pose estimation // 14th European Conference on Computer Vision . Amsterdam : Springer: 483 - 499 [ DOI: 10.1007/978-3-319-46484-8_29 http://dx.doi.org/10.1007/978-3-319-46484-8_29 ]
Ren S Q , He K M , Girshick R and Sun J . 2015 . Faster R-CNN: towards real-time object detection with region proposal networks // Proceedings of the 28th International Conference on Neural Information Processing Systems . Montreal : MIT Press: 91 - 99
Tian Z , Shen C H , Chen H and He T . 2019 . FCOS: fully convolutional one-stage object detection // 2019 IEEE/CVF International Conference on Computer Vision . Seoul : IEEE: 9626 - 9635 [ DOI: 10.1109/ICCV.2019.00972 http://dx.doi.org/10.1109/ICCV.2019.00972 ]
Vaswani A , Shazeer N , Parmar N , Uszkoreit J , Jones L , Gomez A N , Kaiser Ł and Polosukhin I . 2017 . Attention is all you need // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach : Curran Associates Inc.: 6000 - 6010
Wang C , Bai X , Wang S , Zhou J and Ren P . 2019a . Multiscale visual attention networks for object detection in VHR remote sensing images . IEEE Geoscience and Remote Sensing Letters , 16 ( 2 ): 310 - 314 [ DOI: 10.1109/LGRS.2018.2872355 http://dx.doi.org/10.1109/LGRS.2018.2872355 ]
Wang J Q , Chen K , Yang S , Loy C C and Lin D H . 2019b . Region proposal by guided anchoring // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach : IEEE: 2960 - 2969 [ DOI: 10.1109/CVPR.2019.00308 http://dx.doi.org/10.1109/CVPR.2019.00308 ]
Xiao B , Wu H P and Wei Y C . 2018 . Simple baselines for human pose estimation and tracking // 15th European Conference on Computer Vision . Munich : Springer: 472 - 487 [ DOI: 10.1007/978-3-030-01231-1_29 http://dx.doi.org/10.1007/978-3-030-01231-1_29 ]
Yang X , Yan J C , Feng Z M and He T . 2021 . R3Det: refined single-stage detector with feature refinement for rotating object . Proceedings of the AAAI Conference on Artificial Intelligence , 35 ( 4 ): 3163 - 3171 [ DOI: 10.1609/aaai.v35i4.16426 http://dx.doi.org/10.1609/aaai.v35i4.16426 ]
Zhou X Y , Wang D Q and Krähenbühl P . 2019 . Objects as points. arXiv: 1904 . 07850
Zhu X Z , Cheng D Z , Zhang Z , Lin S and Dai J F . 2019 . An empirical study of spatial attention mechanisms in deep networks // 2019 IEEE/CVF International Conference on Computer Vision . Seoul : IEEE: 6687 - 6696 [ DOI: 10.1109/ICCV.2019.00679 http://dx.doi.org/10.1109/ICCV.2019.00679 ]
相关作者
相关机构
京公网安备11010802024621
