扫 描 看 全 文
Tian Zhuangzhuang，Zhang Hengwei，Wang Kun，Liu Shengqi，Zou Qianjin，Zhao Zhen，Chen Yubin. XXXX. Application of an improved CenterNet in remote sensing images object detection. National Remote Sensing Bulletin， XX（XX）：1-11
Nowadays, object detection methods based on deep learning are widely used in the interpretation of remote sensing images. The anchor-based methods usually need to design the anchor boxes first, which requires more detection steps and time cost. This paper proposed a remote sensing image object detection method based on the improved CenterNet. The method can simplify the object detection process and improve efficiency.The CenterNet uses a fully convolutional network to directly predict the heat map of the center points, widths and heights of the corresponding objects, and the position offsets of the center points. The heat maps are used to generate the rough positions of the objects, the offsets can fine-tune the positions to make it more accurate. The widths and heights further constitute the shape of the object boxes. The different heat maps decide the object categories. On the basis of CenterNet, the proposed method first adopts the ResNet with transposed convolution as the backbone network. The transposed convolution can expand the output feature maps, and ResNet can reduce the number of parameters in the backbone network compared with the Hourglass network. Secondly, the proposed method defines the length of Gaussian kernel under three limit conditions between the predicted and real boxes in CenterNet. The Gaussian kernel is applied to generate the heat map label which is used for network training. Finally, the multi-head attention mechanism is introduced into the backbone network to learn the importance of each element in the feature maps. The weights of the elements mean their effectiveness, which makes the effective features concentrate in the regions of the object key points as much as possible.The experiments use mean average precision (mAP) to evaluate the object detection results on the multiple categories. All the experiments are conducted at the DIOR datatset. The results show that the CenterNet using the ResNet with transposed convolution is 1.4% higher than that using the Hourglass. The proposed calculation of the length of the Gaussian kernel can increase mAP by 1.1%. The addition of attention mechanism can further improve the mAP by 1.5%. At the same time, the time cost of the proposed method reduces to 31.9% compared with the conventional method.The experimental results show that the proposed method can improve detection accuracy without sacrificing the detection speed. The ablation experiments of different parts also show that the ResNet with transposed convolution, the designed calculation method of the length of the Gaussian kernel and the attention mechanism can effectively improve the mAP. The comparison with other methods also proves the proposed method is practical.
Remote sensingobject detectiondeep learningCenterNetattention mechanism
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez and Lukasz Kaiser. 2017. Attention Is All You Need. Advances in Neural Information Processing Systems (NIPS).
Chen Qiang, Wang Yingming, Yang Tong, Zhang Xiangyu, Cheng Jian and Sun Jian. 2021. You Only Look One-level Feature. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Chen Ran, Liu Yong, Zhang Mengdan, Liu Shu, Yu Bei and Tai YuWing. 2020. Dive Deeper Into Box for Object Detection. European Conference on Computer Vision (ECCV).
Cheng G.and Han J. 2016. A Survey on Object Detection in Optical Remote Sensing Images. ISPRS Journal of Photogrammetry and Remote Sensing 117: 11–28.
Dai Zhigang, Cai Bolun, Lin Yugeng and Chen Junying. 2021. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Girshick R., Donahue J., Darrell T.and Malik J. 2016. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 38: 142–158.
He K., Zhang X., Ren S. and Sun J. 2016. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Hong Mingbo, Li Shuiwang, Yang Yuchao, Zhu Feiyu, Zhao Qijun and Lu Li. 2021. SSPNet: Scale Selection Pyramid Network for Tiny Person Detection from UAV Images. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Kong T., Sun F., Liu H., Jiang Y., Li L.and Shi J. 2020. FoveaBox: Beyound Anchor-Based Object Detection. IEEE Transactions on Image Processing 29: 7389–7398.
Law H.and Deng J. 2018. CornerNet: Detecting Objects as Paired Keypoints. European Conference on Computer Vision (ECCV).
Li K., Wan G., Cheng G., Meng L. and Han J. 2020. Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark. ISPRS Journal of Photogrammetry and Remote Sensing 159: 296–307.
Lin T. Y., Dollar P., Girshick R., He K., Hariharan B. and Belongie S. 2017a. Feature Pyramid Networks for Object Detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Lin T. Y., Goyal P., Girshick R., He K.and Dollar P. 2017b. Focal Loss for Dense Object Detection. IEEE International Conference on Computer Vision (ICCV).
Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C. Y. and Berg A. C. 2016. SSD: Single Shot MultiBox Detector. European Conference on Computer Vision (ECCV).
Newell A., Yang K. and Deng J. 2016. Stacked Hourglass Networks for Human Pose Estimation. European Conference on Computer Vision (ECCV).
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. arXiv:2005.12872.
Ren S., He K., Girshick R. and Sun J. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems 28: 91–99.
Tian Z., Shen C., Chen H. and He T. 2019. FCOS: Fully Convolutional One-Stage Object Detection. IEEE International Conference on Computer Vision (ICCV).
Wang C., Bai X., Wang S., Zhou J. and Ren P. 2019. Multiscale Visual Attention Networks for Object Detection in VHR Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters 16: 310–314.
Wang Jiaqi, Chen Kai, Yang Shuo, Chen Change Loy and Lin Dahua. 2019. Region Proposal by Guided Anchoring. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Xiao B., Wu H. and Wei Y. 2018. Simple Baselines for Human Pose Estimation and Tracking. European Conference on Computer Vision (ECCV).
Yang Xue, Liu Qingqing, Yan Junchi, Li Ang, Zhang Zhiqiang and Yu Gang. 2021. R3Det: Refined Single-stage Detector with Feature Refinement for Rotating Object. AAAI Conference on Artificial Intelligence.
Zhou X., Wang D. and Krähenbühl P. 2019. Objects as Points. arXiv:1904.07850.
Zhu X., Cheng D., Zhang Z., Lin S. and Dai J. 2019. An Empirical Study of Spatial Attention Mechanisms in Deep Networks. The IEEE International Conference on Computer Vision (ICCV).