From optical to SAR: A SAR ship detection algorithm based on multi-level cross-modality alignment
- Vol. 28, Issue 7, Pages: 1789-1801(2024)
Received:11 July 2023,
Published:07 July 2024
DOI: 10.11834/jrs.20243249
移动端阅览


浏览全部资源
扫码关注微信
Received:11 July 2023,
Published:07 July 2024
移动端阅览
合成孔径雷达(SAR)舰船检测是近年来的研究热点。然而,与光学图像不同,SAR成像的特点会导致不直观的特征表示。此外,由于SAR图像数据量不足,现有的基于大量标记SAR图像的方法可能难以达到较好的检测效果。为了解决这些问题,本文提出了一种基于多级跨模态对齐的SAR图像舰船检测算法MCMA-Net(Multi-level Cross-Modality Alignment Network),通过将光学模态中丰富的知识迁移到SAR模态来增强SAR图像的特征表示。该算法首先设计了一个基于邻域—全局注意力的特征交互网络NGAN(Neighborhood-Global Attention Network),通过对骨干网络的浅层特征采用邻域注意力机制进行局部交互、对深层特征采取全局自注意力机制进行全局上下文交互,在兼顾全局上下文建模能力的同时,提升局部特征的编码能力,使得网络在不同层级更合理的关注相应的信息,从而能够促进后续的多级别模态对齐。其次,本文设计了一个多级模态对齐模块MLMA(Multi-level Modality Alignment),通过从局部级别到全局级别再到实例级别的对两种模态不同隐含空间中的特征进行对齐,促进模型有效地学习模态不变特征,缓解了光学图像和SAR图像之间的模态鸿沟,实现了从光学模态到SAR模态的知识传输。大量的实验证明我们的算法优于现阶段的检测算法,取得了最好的实验结果。
In recent years
interest in Synthetic Aperture Radar (SAR) ship detection has considerably grown. Its distinctive strengths position it as a pivotal player in numerous fields of research. However
the inherent characteristics of SAR images have presented a range of challenges. For instance
in contrast to optical images
SAR images have counterintuitive feature representation. Additionally
owing to the constrained number of SAR image data
achieving satisfactory results with existing methods that depend on a substantial number of annotated SAR images might be challenging.
How to effectively train a high-performance SAR ship detection network with a limited quantity of SAR images should be investigated. Given that single-modality SAR detection algorithms have inherent limitations
other effective modalities that can assist the SAR modality in completing tasks are needed. For instance
in SAR image target detection
optical images can serve as supplementary data sources. A knowledge-rich model can be developed by utilizing a large volume of optical data in training with SAR data. Hence
reasonable training approaches for effectively utilizing images from SAR and optical modalities should be explored.
To address these challenges
a SAR ship detection algorithm called MCMA-Net
which is based on multilevel cross-modality alignment
is proposed in this paper. The MCMA-Net enriches SAR feature representation by incorporating valuable knowledge from optical modality. First
we propose a neighborhood–global attention-based feature interaction network (NGAN)
which employs a neighborhood attention mechanism that enables the local interaction of low-level features and a global self-attention mechanism that captures global context from high-level features. When the ability of global context modeling is considered
the encoding ability of local features improves
NGAN enables the network to focus on corresponding information at different levels and can promote the subsequent multilevel modality alignment. Second
we propose a multilevel modality alignment module (MLMA)
which aligns features in the different hidden spaces of the two modalities from three levels. MLMA facilitates the model to acquire modality-invariant features
bridging the modality gap and realizing optical knowledge transmission. Valuable information from the optical modality can compensate for certain deficiencies in SAR images. With the aid of these two modules
we have incorporated optical superiority information by leveraging SAR’s inherent advantages
achieving an enhancement in the performance of SAR detection tasks.
Our algorithm is superior to current detection algorithms. Notably
whether on public SAR image datasets or our own SAR image dataset
the MCMA-Net consistently achieves optimal detection results
which indicates the model’s stable performance and robustness. The visualization results indicate that the MCMA-Net achieves excellent detection capabilities in complex scenarios. The ablation experiments demonstrate that compared with the baseline model
our algorithm achieved a 2.7% increase in mAP on the SSDD dataset. Various experimental results have consistently validated the rationality of the MCMA-Net.
Bao W , Huang M Y , Zhang Y Q , Xu Y , Liu X J and Xiang X S . 2021 . Boosting ship detection in SAR images with complementary pretraining techniques . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 14 : 8941 - 8954 [ DOI: 10.1109/JSTARS.2021.3109002 http://dx.doi.org/10.1109/JSTARS.2021.3109002 ]
Cai Z W and Vasconcelos N . 2018 . Cascade R-CNN: delving into high quality object detection // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE: 6154 - 6162 [ DOI: 10.1109/CVPR.2018.00644 http://dx.doi.org/10.1109/CVPR.2018.00644 ]
Cao Q , Ma A L , Zhong Y F , Zhao J , Zhao B and Zhang L P . 2019 . Urban classification by multi-feature fusion of hyperspectral image and LiDAR data . Journal of Remote Sensing , 23 ( 5 ): 892 - 903
曹琼 , 马爱龙 , 钟燕飞 , 赵济 , 赵贝 , 张良培 . 2019 . 高光谱-LiDAR多级融合城区地表覆盖分类 . 遥感学报 , 23 ( 5 ): 892 - 903 [ DOI: 10.11834/jrs.20197512 http://dx.doi.org/10.11834/jrs.20197512 ]
Dai J F , Qi H Z , Xiong Y W , Li Y , Zhang G D , Hu H and Wei Y C . 2017 . Deformable convolutional networks // 2017 IEEE International Conference on Computer Vision (ICCV) . Venice, Italy : IEEE: 764 - 773 [ DOI: 10.1109/ICCV.2017.89 http://dx.doi.org/10.1109/ICCV.2017.89 ]
Dosovitskiy A , Beyer L , Kolesnikov A , Weissenborn D , Zhai X H , Unterthiner T , Dehghani M , Minderer M , Heigold G , Gelly S , Uszkoreit J and Houlsby N . 2021 . An image is worth 16 x 16 words: transformers for image recognition at scale //9th International Conference on Learning Representations. Vienna, Austria : OpenReview.net
Girshick R . 2015 . Fast R-CNN // 2015 IEEE International Conference on Computer Vision (ICCV) . Santiago, Chile : IEEE: 1440 - 1448 [ DOI: 10.1109/iccv.2015.169 http://dx.doi.org/10.1109/iccv.2015.169 ]
Guo Y C , Du L and Lyu G X . 2021 . SAR target detection based on domain adaptive faster R-CNN with small training data size . Remote Sensing , 13 ( 21 ): 4202 [ DOI: 10.3390/rs13214202 http://dx.doi.org/10.3390/rs13214202 ]
Hassani A , Walton S , Li J C , Li S and Shi H . 2023 . Neighborhood attention transformer // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 6185 - 6194 [ DOI: 10.1109/CVPR52729.2023.00599 http://dx.doi.org/10.1109/CVPR52729.2023.00599 ]
Hou W and Li Y . 2023 . Multi-resolution CFAR target detection algorithm and accuracy analysis based on SAR image data . Beijing Survey and Mapping , 37 ( 1 ): 104 - 109
侯卫 , 李勇 . 2023 . 基于SAR影像数据的多分辨率CFAR目标检测算法及精度分析 . 北京测绘 , 37 ( 1 ): 104 - 109 [ DOI: 10.19580/j.cnki.1007-3000.2023.01.019 http://dx.doi.org/10.19580/j.cnki.1007-3000.2023.01.019 ]
Karras T , Laine S and Aila T . 2019 . A style-based generator architecture for generative adversarial networks // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE: 4396 - 4405 [ DOI: 10.1109/CVPR.2019.00453 http://dx.doi.org/10.1109/CVPR.2019.00453 ]
Li J W , Qu C W and Shao J Q . 2017 . Ship detection in SAR images based on an improved faster R-CNN // 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA) . Beijing, China : IEEE: 1 - 6 [ DOI: 10.1109/BIGSARDATA.2017.8124934 http://dx.doi.org/10.1109/BIGSARDATA.2017.8124934 ]
Li W , Wang J J , Gao Y H , Zhang M M , Tao R and Zhang B . 2022 . Graph-feature-enhanced selective assignment network for hyperspectral and multispectral data classification . IEEE Transactions on Geoscience and Remote Sensing , 60 : 5526914 [ DOI: 10.1109/TGRS.2022.3166252 http://dx.doi.org/10.1109/TGRS.2022.3166252 ]
Li Y , Ding Z G , Zhang C , Wang Y and Chen J . 2019 . SAR ship detection based on resnet and transfer learning // IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium . Yokohama, Japan : IEEE: 1188 - 1191 [ DOI: 10.1109/IGARSS.2019.8900290 http://dx.doi.org/10.1109/IGARSS.2019.8900290 ]
Lin T Y , Goyal P , Girshick R , He K M and Dollar P . 2017 . Focal loss for dense object detection // 2017 IEEE International Conference on Computer Vision (ICCV) . Venice, Italy : IEEE: 2999 - 3007 [ DOI: 10.1109/ICCV.2017.324 http://dx.doi.org/10.1109/ICCV.2017.324 ]
Lin Z , Ji K F , Leng X G and Kuang G Y . 2019 . Squeeze and excitation rank faster R-CNN for ship detection in SAR images . IEEE Geoscience and Remote Sensing Letters , 16 ( 5 ): 751 - 755 [ DOI: 10.1109/LGRS.2018.2882551 http://dx.doi.org/10.1109/LGRS.2018.2882551 ]
Liu S , Qi L , Qin H F , Shi J P and Jia J Y . 2018 . Path aggregation network for instance segmentation // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE: 8759 - 8768 [ DOI: 10.1109/CVPR.2018.00913 http://dx.doi.org/10.1109/CVPR.2018.00913 ]
Liu W , Anguelov D , Erhan D , Szegedy C , Reed S , Fu C Y and Berg A C . 2016 . SSD: single shot MultiBox detector // 14th European Conference on Computer Vision . Amsterdam, The Netherlands : Springer: 21 - 37 [ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]
Liu Z K , Yuan L , Weng L B and Yang Y P . 2017 . A high resolution optical satellite image dataset for ship recognition and some new baselines // Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods , ICPRAM 2017. Porto, Portugal : SciTePress: 324 - 331
Lu X , Li B Y , Yue Y X , Li Q Q and Yan J J . 2019 . Grid R-CNN // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE: 7355 - 7364 [ DOI: 10.1109/CVPR.2019.00754 http://dx.doi.org/10.1109/CVPR.2019.00754 ]
Miao T , Zeng H C , Yang W , Chu B C , Zou F , Ren W J and Chen J . 2022 . An improved lightweight retinaNet for ship detection in SAR images . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 15 : 4667 - 4679 [ DOI: 10.1109/JSTARS.2022.3180159 http://dx.doi.org/10.1109/JSTARS.2022.3180159 ]
Pappas O , Achim A and Bull D . 2018 . Superpixel-level CFAR detectors for ship detection in SAR imagery . IEEE Geoscience and Remote Sensing Letters , 15 ( 9 ): 1397 - 1401 [ DOI: 10.1109/LGRS.2018.2838263 http://dx.doi.org/10.1109/LGRS.2018.2838263 ]
Redmon J , Divvala S , Girshick R and Farhadi A . 2016 . You only look once: unified, real-time object detection // 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, USA : IEEE: 779 - 788 [ DOI: 10.1109/CVPR.2016.91 http://dx.doi.org/10.1109/CVPR.2016.91 ]
Redmon J and Farhadi A . 2017 . YOLO9000: better, faster, stronger // 2017 IEEE Conference on Computer Vision and Pattern Recognition . Hawaii, USA : IEEE: 6517 - 6525 [ DOI: 10.1109/CVPR.2017.690 http://dx.doi.org/10.1109/CVPR.2017.690 ]
Redmon J and Farhadi A . 2018 . YOLOv3: an incremental improvement. arXiv: 1804 . 02767
Ren S Q , He K M , Girshick R and Sun J . 2015 . Faster R-CNN: towards real-time object detection with region proposal networks // Proceedings of the 28th International Conference on Neural Information Processing Systems . Montreal, Canada : MIT Press: 91 - 99
Saito K , Ushiku Y , Harada T and Saenko K . 2019 . Strong-weak distribution alignment for adaptive object detection // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE: 6949 - 6958 [ DOI: 10.1109/CVPR.2019.00712 http://dx.doi.org/10.1109/CVPR.2019.00712 ]
Shi Y , Du L , Guo Y C and Du Y . 2022 . Unsupervised domain adaptation based on progressive transfer for ship detection: from optical to SAR images . IEEE Transactions on Geoscience and Remote Sensing , 60 : 5230317 [ DOI: 10.1109/TGRS.2022.3185298 http://dx.doi.org/10.1109/TGRS.2022.3185298 ]
Wang J J , Li W , Gao Y H , Zhang M M , Tao R and Du Q . 2023b . Hyperspectral and SAR Image Classification via Multiscale Interactive Fusion Network . IEEE Transactions on Neural Networks and Learning Systems , 34 ( 12 ): 10823 - 10837 [ DOI: 10.1109/TNNLS.2022.3171572 http://dx.doi.org/10.1109/TNNLS.2022.3171572 ]
Wang J J , Li W , Wang Y J , Tao R and Du Q . 2023c . Representation-enhanced status replay network for multisource remote-sensing image classification . IEEE Transactions on Neural Networks and Learning Systems , 1 - 13 [ DOI: 10.1109/TNNLS.2023.3286422 http://dx.doi.org/10.1109/TNNLS.2023.3286422 ]
Wang S Y , Cai Z C and Yuan J Y . 2023a . Automatic SAR ship detection based on multifeature fusion network in spatial and frequency domain . IEEE Transactions on Geoscience and Remote Sensing , 61 : 4102111 [ DOI: 10.1109/TGRS.2023.3267495 http://dx.doi.org/10.1109/TGRS.2023.3267495 ]
Wei S J , Zeng X F , Qu Q Z , Wang M , Su H and Shi J . 2020 . HRSID: a high-resolution SAR images dataset for ship detection and instance segmentation . IEEE Access , 8 : 120234 - 120254 [ DOI: 10.1109/ACCESS.2020.3005861 http://dx.doi.org/10.1109/ACCESS.2020.3005861 ]
Wu Y , Chen Y P , Yuan L , Liu Z C , Wang L J , Li H Z and Fu Y . 2020 . Rethinking classification and localization for object detection // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 10183 - 10192 [ DOI: 10.1109/CVPR42600.2020.01020 http://dx.doi.org/10.1109/CVPR42600.2020.01020 ]
Yao Y Q , Cheng G , Xie X X and Han J W . 2021 . Optical remote sensing image object detection based on multi-resolution feature fusion . Journal of Remote Sensing , 25 ( 5 ): 1124 - 1137
姚艳清 , 程塨 , 谢星星 , 韩军伟 . 2021 . 多分辨率特征融合的光学遥感图像目标检测 . 遥感学报 , 25 ( 5 ): 1124 - 1137 [ DOI: 10.11834/jrs.20210505 http://dx.doi.org/10.11834/jrs.20210505 ]
Yu Y , Ai H , He X J , Yu S H , Zhong X and Zhu R F . 2020 . Attention based feature pyramid networks for ship detection of optical remote sensing image . Journal of Remote Sensing , 24 ( 2 ): 107 - 115
于野 , 艾华 , 贺小军 , 于树海 , 钟兴 , 朱瑞飞 . 2020 . A-FPN 算法及其在遥感图像船舶检测中的应用 . 遥感学报 , 24 ( 2 ): 107 - 115 [ DOI: 10.11834/jrs.20208264 http://dx.doi.org/10.11834/jrs.20208264 ]
Zhang F , Lu S T , Xiang D L and Yuan X Z . 2023 . An improved superpixel-based CFAR method for high-resolution SAR image ship target detection . Journal of Radars , 12 ( 1 ): 120 - 139
张帆 , 陆圣涛 , 项德良 , 袁新哲 . 2023 . 一种改进的高分辨率SAR图像超像素CFAR舰船检测算法 . 雷达学报 , 12 ( 1 ): 120 - 139 [ DOI: 10.12000/JR22067 http://dx.doi.org/10.12000/JR22067 ]
Zhang M M , Li W , Zhang Y X , Tao R and Du Q . 2023 . Hyperspectral and LiDAR data classification based on structural optimization transmission . IEEE Transactions on Cybernetics , 53 ( 5 ): 3153 - 3164 [ DOI: 10.1109/TCYB.2022.3169773 http://dx.doi.org/10.1109/TCYB.2022.3169773 ]
Zhang Y , Wang X Q , Jiang Z Z , Li G and He Y . 2022 . An efficient center-based method with multilevel auxiliary supervision for multiscale SAR ship detection . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 15 : 7065 - 7075 [ DOI: 10.1109/JSTARS.2022.3197210 http://dx.doi.org/10.1109/JSTARS.2022.3197210 ]
Zhao Y , Zhao L J , Xiong B L and Kuang G Y . 2020 . Attention receptive pyramid network for ship detection in SAR images . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 13 : 2738 - 2756 [ DOI: 10.1109/JSTARS.2020.2997081 http://dx.doi.org/10.1109/JSTARS.2020.2997081 ]
Zhu J Y , Park T , Isola P and Efros A A . 2017 . Unpaired image-to-image translation using cycle-consistent adversarial networks // 2017 IEEE International Conference on Computer Vision (ICCV) . Venice, Italy : IEEE: 2242 - 2251 [ DOI: 10.1109/ICCV.2017.244 http://dx.doi.org/10.1109/ICCV.2017.244 ]
Zhou P C , Cheng G , Yao X W and Han J W . 2021 . Machine learning paradigms in high-resolution remote sensing image interpretation . Journal of Remote Sensing , 25 ( 1 ): 182 - 197
周培诚 , 程塨 , 姚西文 , 韩军伟 . 2021 . 高分辨率遥感影像解译中的机器学习范式 . 遥感学报 , 25 ( 1 ): 182 - 197 [ DOI: 10.11834/jrs.20210164 http://dx.doi.org/10.11834/jrs.20210164 ]
相关作者
相关机构
京公网安备11010802024621