基于脉冲神经网络微调方法的遥感图像目标检测
Research on Spiking Neural Network fine-tuning method for object detection in remote sensing images
- 2024年28卷第7期 页码:1702-1712
收稿:2023-07-11,
纸质出版:2024-07-07
DOI: 10.11834/jrs.20243272
移动端阅览
收稿:2023-07-11,
纸质出版:2024-07-07
移动端阅览
遥感影像目标检测问题是视觉图像识别任务的重要研究内容之一,但是在船舶遥感图像中,船舶目标小且分布稀疏,使用传统的人工神经网络(ANN)进行目标检测往往会浪费大量的计算资源。脉冲神经网络(SNN)的事件驱动与低功耗特性可以极大地节省能量消耗同时解放更多的计算资源。然而SNN神经元由于其复杂动态与不可微调的脉冲操作,难以正常进行训练。作为替代,将训练好的ANN转换为SNN可以有效规避这一问题。对于转换后的深层SNN,需要大量时间步长(time steps)来维持其性能。这一过程需要大量的计算资源并对产生较大的延迟,与低功耗的研究初衷相违背。本文研究了转换后SNN需要大量time steps维持模型性能的原因,并提出了新的转换方法,基于微调的逐层转换方法;考虑硬件部署的合理性,提出了泊松群编码,相比泊松编码,泊松群编码输出的脉冲序列噪声更小,对模型性能的影响更小。实验表明,微调转换方法在SAR舰船检测数据集(SSDD、AIR-SARShip)上取得与转换前模型(97.9%、79.6%)相近的性能(96.9%、70.3%),在PASCAL VOC数据集上也获得了较好的检测性能(49.2%),而且对于泊松群编码,time steps相同的条件下神经元数目越多,对模型性能的影响越小,时间步长较少的条件下即可获得与输入模拟频率近似的性能。本文的研究可以提升转换后SNN的性能,减少转换后SNN对time steps的需求,并为SNN的硬件部署提供了一个切实有效的输入编码方法。
Object detection in remote sensing images is essential research contents of visual image recognition tasks. However
in the remote sensing images of ships
a ship target is small and sparsely distributed
and using a traditional Artificial Neural Network (ANN) for object detection often wastes a considerable amount of computing resources. SpikinG Neural Networks (SNNs) can be applied due to its event-driven and low-power characteristics
greatly saving energy and computing resources. However
training an SNN is difficult because of the complex dynamics and nondifferentiable pulse operation of SNN neurons. Instead
converting a trained ANN into SNN can effectively circumvent training difficulties. For a converted deep SNN
many time steps are often required to maintain its performance. Unfortunately
this process requires a substantial amount of computing resources.
This paper studies the reason why a large number of time steps are required to maintain an SNN model’s performance after conversion and proposes a novel conversion method: a layer-by-layer conversion method based on fine-tuning. During conversion
the network is converted layer by layer
and the subsequent unconverted network is fine-tuned
and thus the errors accumulate layer by layer during conversion is prevented. In addition
given the rationality of hardware deployment
we propose Poisson group coding
which uses multiple Poisson coding neurons to encode input images and sends them to the network after average pooling. Compared with Poisson coding
the output of Poisson group coding is less noisy and has less impact on model performance.
The fine-tuning transformation method achieves a result (96.9%
70.3%)
similar to that of YOLOv3-tiny (97.9%
79.6%) on the SAR ship detection datasets (AIR-SARShip)
and 80% of the performance of the preconversion model can be achieved by using few time steps (20 and 80 steps). The method achieves good detection performance (49.2%) on the PASCAL VOC dataset. By contrast
the performance of the conventional conversion method is inferior to that of the fine-tuning conversion method in the same number of time steps and usually requires many time steps (more than 150 time steps)
achieving improved detection performance. For Poisson group coding
the impact on model performance under the same time steps decreases with increasing neurons. Performance similar to the input simulation frequency can be achieved with few time steps.
The layer-by-layer conversion method based on the proposed fine-tuning method effectively adapts the SNN network for the layer-by-layer conversion of error
preventing the accumulation of error in each layer and reducing SNN. This method can improve the performance of a converted SNN and reduce the time steps. Meanwhile
Poisson group coding provides an effective input coding method for the hardware deployment of an SNN.
Bochkovskiy A , Wang C Y and Liao H Y M . 2020 . YOLOv4: optimal Speed and accuracy of object detection . arXiv : 2004 . 10934 [ DOI: 10.48550/arXiv.2004.10934 http://dx.doi.org/10.48550/arXiv.2004.10934 ]
Cao Y Q , Chen Y and Khosla D . 2015 . Spiking deep convolutional neural networks for energy-efficient object recognition . International Journal of Computer Vision , 113 ( 1 ): 54 - 66 [ DOI: 10.1007/s11263-014-0788-3 http://dx.doi.org/10.1007/s11263-014-0788-3 ]
Cheng G and Han J W . 2016 . A survey on object detection in optical remote sensing images . ISPRS Journal of Photogrammetry and Remote Sensing , 117 : 11 - 28 [ DOI: 10.1016/j.isprsjprs.2016.03.014 http://dx.doi.org/10.1016/j.isprsjprs.2016.03.014 ]
Dalal N and Triggs B . 2005 . Histograms of oriented gradients for human detection // 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . San Diego : IEEE: 886 - 893 [ DOI: 10.1109/cvpr.2005.177 http://dx.doi.org/10.1109/cvpr.2005.177 ]
Diehl P U and Cook M . 2015 . Unsupervised learning of digit recognition using spike-timing-dependent plasticity . Frontiers in Computational Neuroscience , 9 : 99 [ DOI: 10.3389/fncom.2015.00099 http://dx.doi.org/10.3389/fncom.2015.00099 ]
Diehl P U , Neil D , Binas J , Cook M , Liu S C and Pfeiffer M . 2015 . Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing // 2015 International Joint Conference on Neural Networks (IJCNN) . Killarney : IEEE: 1 - 8 [ DOI: 10.1109/ijcnn.2015.7280696 http://dx.doi.org/10.1109/ijcnn.2015.7280696 ]
Fan X N , Yan W , Shi P F and Zhang X W . 2022 . Remote sensing image target detection based on a multi-scale deep feature fusion network . National Remote Sensing Bulletin , 26 ( 11 ): 2292 - 2303
范新南 , 严炜 , 史朋飞 , 张学武 . 2022 . 多尺度深度特征融合网络的遥感图像目标检测 . 遥感学报 , 26 ( 11 ): 2292 - 2303 [ DOI: 10.11834/jrs.20210170 http://dx.doi.org/10.11834/jrs.20210170 ]
Felzenszwalb P F , Girshick R B , McAllester D and Ramanan D . 2010 . Object detection with discriminatively trained part-based models . IEEE Transactions on Pattern Analysis and Machine Intelligence , 32 ( 9 ): 1627 - 1645 [ DOI: 10.1109/tpami.2009.167 http://dx.doi.org/10.1109/tpami.2009.167 ]
Girshick R , Donahue J , Darrell T and Malik J . 2014 . Rich feature hierarchies for accurate object detection and semantic segmentation // 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus : IEEE: 580 - 587 [ DOI: 10.1109/cvpr.2014.81 http://dx.doi.org/10.1109/cvpr.2014.81 ]
Girshick R . 2015 . Fast R-CNN // 2015 IEEE International Conference on Computer Vision . Santiago : IEEE: 1440 - 1448 [ DOI: 10.1109/iccv.2015.169 http://dx.doi.org/10.1109/iccv.2015.169 ]
He K M , Gkioxari G , Dollár P and Girshick R . 2017 . Mask R-CNN // 2017 IEEE International Conference on Computer Vision . Venice : IEEE: 2980 - 2988 [ DOI: 10.1109/ICCV.2017.322 http://dx.doi.org/10.1109/ICCV.2017.322 ]
Jiao L C , Zhang F , Liu F , Yang S Y , Li L L , Feng Z X and Qu R . 2019 . A survey of deep learning-based object detection . IEEE Access , 7 : 128837 - 128868 [ DOI: 10.1109/ACCESS.2019.2939201 http://dx.doi.org/10.1109/ACCESS.2019.2939201 ]
Kim S , Park S , Na B , Kim J and Yoon S . 2021 . Towards fast and accurate object detection in bio-inspired spiking neural networks through Bayesian optimization . IEEE Access , 9 : 2633 - 2643 [ DOI: 10.1109/ACCESS.2020.3047071 http://dx.doi.org/10.1109/ACCESS.2020.3047071 ]
Kim S , Park S , Na B and Yoon S . 2020 . Spiking-YOLO: spiking neural network for energy-efficient object detection . Proceedings of the AAAI Conference on Artificial Intelligence , 34 ( 7 ): 11270 - 11277 [ DOI: 10.1609/aaai.v34i07.6787 http://dx.doi.org/10.1609/aaai.v34i07.6787 ]
Lee J H , Delbruck T and Pfeiffer M . 2016 . Training deep spiking neural networks using backpropagation . Frontiers in Neuroscience , 10 : 508 [ DOI: 10.3389/fnins.2016.00508 http://dx.doi.org/10.3389/fnins.2016.00508 ]
Li C Y , Li L L , Jiang H L , Weng K H , Geng Y F , Li L , Ke Z D , Li Q Y , Cheng M , Nie W Q , Li Y D , Zhang B , Liang Y F , Zhou L Y , Xu X M , Chu X X , Wei X M and Wei X L . 2022 . YOLOv6: a single-stage object detection framework for industrial applications . arXiv : 2209 . 02976 [ DOI: 10.48550/arXiv.2209.02976 http://dx.doi.org/10.48550/arXiv.2209.02976 ]
Li J W , Qu C W and Shao J Q . 2017 . Ship detection in SAR images based on an improved faster R-CNN // 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA) . Beijing : IEEE: 1 - 6 [ DOI: 10.1109/bigsardata.2017.8124934 http://dx.doi.org/10.1109/bigsardata.2017.8124934 ]
Li K , Wan G , Cheng G , Meng L Q and Han J W . 2020 . Object detection in optical remote sensing images: a survey and a new benchmark . ISPRS Journal of Photogrammetry and Remote Sensing , 159 : 296 - 307 [ DOI: 10.1016/j.isprsjprs.2019.11.023 http://dx.doi.org/10.1016/j.isprsjprs.2019.11.023 ]
Li Y and Zheng Y . 2022 . Efficient and accurate conversion of spiking neural network with burst spikes // Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence . Vienna : [s.n.]: 2485 - 2491 [ DOI: 10.24963/ijcai.2022/345 http://dx.doi.org/10.24963/ijcai.2022/345 ]
Liu W , Anguelov D , Erhan D , Szegedy C , Reed S , Fu C Y and Berg A C . 2016 . SSD: single shot MultiBox detector // 14th European Conference on Computer Vision . Amsterdam : Springer: 21 - 37 [ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]
Luo Y H , Xu M , Yuan C H , Cao X , Zhang L Q , Xu Y , Wang T J and Feng Q . 2021 . SiamSNN: siamese spiking neural networks for energy-efficient object tracking // 30th International Conference on Artificial Neural Networks . Bratislava : Springer: 182 - 194 [ DOI: 10.1007/978-3-030-86383-8_15 http://dx.doi.org/10.1007/978-3-030-86383-8_15 ]
Maass W . 1997 . Networks of spiking neurons: the third generation of neural network models . Neural Networks , 10 ( 9 ): 1659 - 1671 [ DOI: 10.1016/s0893-6080(97)00011-7 http://dx.doi.org/10.1016/s0893-6080(97)00011-7 ]
Menghani G . 2023 . Efficient deep learning: a survey on making deep learning models smaller, faster, and better . ACM Computing Surveys , 55 ( 12 ): 259 [ DOI: 10.1145/3578938 http://dx.doi.org/10.1145/3578938 ]
Patel K , Hunsberger E , Batir S and Eliasmith C . 2021 . A spiking neural network for image segmentation . arXiv : 2106 . 08921 [ DOI: 10.48550/arXiv.2106.08921 http://dx.doi.org/10.48550/arXiv.2106.08921 ]
Redmon J , Divvala S , Girshick R and Farhadi A . 2016 . You only look once: unified, real-time object detection // 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE: 779 - 788 [ DOI: 10.1109/cvpr.2016.91 http://dx.doi.org/10.1109/cvpr.2016.91 ]
Redmon J and Farhadi A . 2017 . YOLO9000: better, faster, stronger // 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu : IEEE: 6517 - 6525 [ DOI: 10.1109/cvpr.2017.690 http://dx.doi.org/10.1109/cvpr.2017.690 ]
Redmon J and Farhadi A . 2018 . YOLOv3: an incremental improvement . arXiv : 1804 . 02767 [ DOI: 10.48550/arXiv.1804.02767 http://dx.doi.org/10.48550/arXiv.1804.02767 ]
Ren S Q , He K M , Girshick R and Sun J . 2017 . Faster R-CNN: towards real-time object detection with region proposal networks . IEEE Transactions on Pattern Analysis and Machine Intelligence , 39 ( 6 ): 1137 - 1149 [ DOI: 10.1109/tpami.2016.2577031 http://dx.doi.org/10.1109/tpami.2016.2577031 ]
Sengupta A , Ye Y T , Wang R , Liu C A and Roy K . 2019 . Going deeper in spiking neural networks: VGG and residual architectures . Frontiers in Neuroscience , 13 : 425055 [ DOI: 10.3389/fnins.2019.00095 http://dx.doi.org/10.3389/fnins.2019.00095 ]
Sun X , Wang Z R , Sun Y R , Diao W H , Zhang Y and Fu K . 2019 . AIR-SARShip-1.0: high-resolution SAR ship detection dataset . Journal of Radars , 8 ( 6 ): 852 - 862
孙显 , 王智睿 , 孙元睿 , 刁文辉 , 张跃 , 付琨 . 2019 . AIR-SARShip-1.0: 高分辨率SAR舰船检测数据集 . 雷达学报 , 8 ( 6 ): 852 - 862 [ DOI: 10.12000/JR19097 http://dx.doi.org/10.12000/JR19097 ]
Viola P and Jones M J . 2004 . Robust real-time face detection . International Journal of Computer Vision , 57 ( 2 ): 137 - 154 [ DOI: 10.1023/b:visi.0000013087.49260.fb http://dx.doi.org/10.1023/b:visi.0000013087.49260.fb ]
Wu Y J , Deng L , Li G Q , Zhu J , Xie Y and Shi L P . 2019 . Direct training for spiking neural networks: faster, larger, better . Proceedings of the AAAI Conference on Artificial Intelligence , 33 ( 1 ): 1311 - 1318 [ DOI: 10.1609/aaai.v33i01.33011311 http://dx.doi.org/10.1609/aaai.v33i01.33011311 ]
Xiang S Y , Jiang S Q , Liu X S , Zhang T and Yu L C . 2022 . Spiking VGG7: deep convolutional spiking neural network with direct training for object recognition . Electronics , 11 ( 13 ): 2097 [ DOI: 10.3390/electronics11132097 http://dx.doi.org/10.3390/electronics11132097 ]
Yu Y , Ai H , He X J , Yu S H , Zhong X and Zhu R F . 2020 . Attention-based feature pyramid networks for ship detection of optical remote sensing image . National Remote Sensing Bulletin , 24 ( 2 ): 107 - 115
于野 , 艾华 , 贺小军 , 于树海 , 钟兴 , 朱瑞飞 . 2020 . A-FPN算法及其在遥感图像船舶检测中的应用 . 遥感学报 , 24 ( 2 ): 107 - 115 [ DOI: 10.11834/jrs.20208264 http://dx.doi.org/10.11834/jrs.20208264 ]
Zhang T , Yang X G , Lu X Q , Lu R T and Zhang S X . 2022 . Ship detection in remote sensing image based on dense RFB and LSTM . National Remote Sensing Bulletin , 26 ( 9 ): 1859 - 1871
张涛 , 杨小冈 , 卢孝强 , 卢瑞涛 , 张胜修 . 2022 . Dense RFB和LSTM遥感图像舰船目标检测 . 遥感学报 , 26 ( 9 ): 1859 - 1871 [ DOI: 10.11834/jrs.20211042 http://dx.doi.org/10.11834/jrs.20211042 ]
相关作者
相关机构
京公网安备11010802024621
