Time series classification of remote sensing data based on temporal self-attention mechanism
- Vol. 27, Issue 8, Pages: 1914-1924(2023)
Published: 07 August 2023
DOI: 10.11834/jrs.20210453
扫 描 看 全 文
浏览全部资源
扫码关注微信
Published: 07 August 2023 ,
扫 描 看 全 文
张伟雄,唐娉,张正.2023.基于时序自注意力机制的遥感数据时间序列分类.遥感学报,27(8): 1914-1924
Zhang W X, Tang P and Zhang Z. 2023. Time series classification of remote sensing data based on temporal self-attention mechanism. National Remote Sensing Bulletin, 27(8):1914-1924
遥感影像时间序列为土地覆盖分类研究提供了重要的数据基础,利用深度学习提取时序分类特征一直是研究的热点,而基于循环网络和卷积网络的深度学习模型在训练样本不均衡时往往难以在小样本地类上取得高精度分类结果,针对这一问题,本文引入自然语言处理领域最新的自注意力机制方法用于多光谱遥感时序数据分类。通过对Transformer编码器进行两点改进:(1)在多头注意力前添加特征升维层,提升数据的光谱信息;(2)使用拉伸后降维取代全局最大值池化GMP(Global Maximum Pooling)作为特征维度降维策略。构建基于时序自注意力机制的特征提取网络,与循环网络和卷积网络进行对比,利用公开的多光谱遥感时序数据集评估本文所用方法对于小样本类别精度提高的有效性。实验结果表明本文基于时序自注意力机制构建的特征提取网络能够有效应用于多光谱遥感时序数据分类问题,并对小样本地类分类精度提升有所帮助。
With the rapid development of remote sensing technology
the continuous accumulation of remote sensing time series data provides an important data support for studying land cover classification. Extracting classificational discriminative features from remote sensing time series data by using deep learning methods has become a hot research topic. Deep learning methods require a large number of training data
but sample imbalance prevents the commonly used recurrent and convolutional networks from achieving high accuracies in categories that have a small number of samples. To address this problem
this paper introduces the self-attention mechanism originating from the field of natural language processing to the classification of multispectral remote sensing time series data with the aim of extracting deep temporal features at a global scale. This mechanism differs from recurrent networks
which extract temporal features by using the previous time information along the temporal dimension
and from convolutional networks
which extract temporal features at the local time neighborhood.
We construct a new feature extraction network based on the transformer encoder
which initially employs the self-attention mechanism in natural language processing
and then compare this neatwork with the long- and short-term-memory-based feature extraction network and temporal-convolution-neural-network-based feature extraction network to evaluate the effectiveness of the self-attention-based method in improving the classification accuracy of small-sample categories. To achieve a fair comparison
we adopt a generic classification framework consisting of data input
feature extraction network
classifier
and classification output
and we use different models with various hyperparameters as feature extraction networks. We then evaluate the classification performance of different methods on the TiSeLaC public multispectral remote sensing time series dataset by using per-class accuracy
overall accuracy (OA)
and mean intersection over union (mIoU) as metrics.
To obtain a proper measure of different methods
we choose the top three mIoU hyperparameter settings for each model and then calculate the average metrics as the final result. Results show that the self-attention-based network outperforms both the recurrent and convolutional networks. This network achieves a 92.98% OA and 80.60% mIoU
which are 1.25% and 1.32% higher than those achieved by the recurrent and convolutional networks
respectively. In terms of per-class accuracy
while the self-attention-based network achieves equivalent accuracies with differences of less than 0.74% in the large-sample categories compared with the recurrent and convolutional networks
the proposed network can significantly improve classification accuracies in small-sample categories by large margins ranging from 2.47% to 5.41%.
This paper introduces the self-attention mechanism to the classification of multispectral remote sensing time series data to address the problem of low classification accuracy in small-sample categories caused by sample imbalance. We construct a new temporal feature extraction network based on the self-attention mechanism to globally extract temporal features from time series and design a set of objective comparison experiments. Experiment results show that by globally extracting temporal features from time series
instead of using previous time information (as in the case of recurrent networks) and focusing on the local time neighborhood (as in the case of convolutional networks)
the self-attention-based network achieves the same accuracy in majority-sample categories and effectively improves the accuracy in small-sample categories. Therefore
the self-attention-based network can play an important role in the future classification of remote sensing time series
and further research on this network is critical.
自注意力机制深度学习遥感数据时间序列土地覆盖分类不均衡样本
self-attention mechanismdeep learningremote sensing time seriesland cover classificationimbalance of samples
Garnot V S F and Landrieu L. 2020. Lightweight temporal self-attention for classifying satellite images time series//5th ECML PKDD Workshop on Advanced Analytics and Learning on Temporal Data. Ghent: Springer: 171-181 [DOI: 10.1007/978-3-030-65742-0_12http://dx.doi.org/10.1007/978-3-030-65742-0_12]
Garnot V S F, Landrieu L, Giordano S and Chehata N. 2020. Satellite image time series classification with pixel-set encoders and temporal self-attention//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE: 12322-12331 [DOI: 10.1109/CVPR42600.2020.01234http://dx.doi.org/10.1109/CVPR42600.2020.01234]
Hochreiter S and Schmidhuber J. 1997. Long short-term memory. Neural Computation, 9(8): 1735-1780 [DOI: 10.1162/neco.1997.9.8.1735http://dx.doi.org/10.1162/neco.1997.9.8.1735]
Ienco D, Gaetano R, Dupaquier C and Maurel P. 2017. Land cover classification via multitemporal spatial data by deep recurrent neural networks. IEEE Geoscience and Remote Sensing Letters, 14(10): 1685-1689 [DOI: 10.1109/LGRS.2017.2728698http://dx.doi.org/10.1109/LGRS.2017.2728698]
Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//3rd International Conference on Learning Representations. San Diego: ICLR [DOI: 10.48550/arXiv.1412.6980http://dx.doi.org/10.48550/arXiv.1412.6980]
Lin T Y, Goyal P, Girshick R, He K M and Dollar P. 2017. Focal loss for dense object detection//2017 IEEE International Conference on Computer Vision. Venice: IEEE: 2999-3007 [DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]
Maus V, Câmara G, Cartaxo R, Sanchez A, Ramos F M and de Queiroz G R. 2016. A time-weighted dynamic time warping method for land-use and land-cover mapping. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(8): 3729-3739 [DOI: 10.1109/JSTARS.2016.2517118http://dx.doi.org/10.1109/JSTARS.2016.2517118]
Mikolov T, Sutskever I, Chen K, Corrado G and Dean J. 2013. Distributed representations of words and phrases and their compositionality//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc.: 3111-3119
Pelletier C, Webb G and Petitjean F. 2019. Temporal convolutional neural network for the classification of satellite image time series. Remote Sensing, 11(5): 523 [DOI: 10.3390/rs11050523http://dx.doi.org/10.3390/rs11050523]
Petitjean F, Inglada J and Gancarski P. 2012. Satellite image time series analysis under time warping. IEEE Transactions on Geoscience and Remote Sensing, 50(8): 3081-3095 [DOI: 10.1109/TGRS.2011.2179050http://dx.doi.org/10.1109/TGRS.2011.2179050]
Rußwurm M and Körner M. 2017. Temporal vegetation modelling using long short-term memory networks for crop identification from medium-resolution multi-spectral satellite images//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu: IEEE: 1496-1504 [DOI: 10.1109/CVPRW.2017.193http://dx.doi.org/10.1109/CVPRW.2017.193]
Rußwurm M and Körner M. 2018. Multi-temporal land cover classification with sequential recurrent encoders. ISPRS International Journal of Geo-Information, 7(4): 129 [DOI: 10.3390/ijgi7040129http://dx.doi.org/10.3390/ijgi7040129]
Rußwurm M., & Körner M. 2020. Self-attention for raw optical satellite time series classification. ISPRS journal of photogrammetry and remote sensing, 169, 421-435 [DOI: 10.1016/j.isprsjprs.2020.06.006http://dx.doi.org/10.1016/j.isprsjprs.2020.06.006]
Sutskever I, Vinyals O and Le Q V. 2014. Sequence to sequence learning with neural networks//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: MIT Press: 3104-3112
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc.: 6000-6010
Zhang Z, Tang P and Duan R B. 2015. Dynamic time warping under pointwise shape context. Information Sciences, 315: 88-101 [DOI: 10.1016/j.ins.2015.04.007http://dx.doi.org/10.1016/j.ins.2015.04.007]
Zhang Z, Tang P, Huo L Z and Zhou Z G. 2014. MODIS NDVI time series clustering under dynamic time warping. International Journal of Wavelets, Multiresolution and Information Processing, 12(5): 1461011 [DOI: 10.1142/S0219691314610116http://dx.doi.org/10.1142/S0219691314610116]
Zhang Z, Tavenard R, Bailly A, Tang X T, Tang P and Corpetti T. 2017. Dynamic time warping under limited warping path length. Information Sciences, 393: 91-107 [DOI: 10.1016/j.ins.2017.02.018http://dx.doi.org/10.1016/j.ins.2017.02.018]
Zhong L H, Hu L N and Zhou H. 2019. Deep learning based multi-temporal crop classification. Remote Sensing of Environment, 221: 430-443 [DOI: 10.1016/j.rse.2018.11.032http://dx.doi.org/10.1016/j.rse.2018.11.032]
相关文章
相关作者
相关机构