A comprehensive survey and assumption of remote sensing foundation modal
- Vol. 28, Issue 7, Pages: 1667-1680(2024)
Received:23 July 2023,
Published:07 July 2024
DOI: 10.11834/jrs.20233313
移动端阅览


浏览全部资源
扫码关注微信
Received:23 July 2023,
Published:07 July 2024
移动端阅览
近年来,遥感智能解译技术快速发展,但大多为专用模型难以泛化到不同任务中,易造成资源浪费。基础模型是一种通用可泛化的解决方案,最近在遥感领域备受关注。尽管目前有大量工作已利用遥感单时相或多时相数据在感知识别和认知预测的部分任务上取得显著成果,但缺乏一个全面的综述给遥感基础模型提供系统概述。因此本文首先从数据、方法和应用角度对现有遥感基础模型的研究进展进行总结,然后通过分析现状存在的局限提出新一代遥感通用预测基础模型的设想,最后针对亟需研究的方向进行探讨与实验,为研究人员提供遥感基础模型过去成果与未来可能性之间的桥梁。
In recent years
remote sensing intelligent interpretation technologies have advanced rapidly
but most established models are task oriented. Therefore
generalizing them to different tasks is difficult
and considerable amounts of resources are wasted. The foundation model is a straightforward approach that has recently attracted considerable interest in the field of remote sensing. Although many works have achieved remarkable results in some tasks for perception recognition and cognitive prediction by using remote sensing single-temporal or multitemporal data
a comprehensive review that provides a systematic overview of the remote sensing foundation model is lacking. Thus
this paper begins by summarizing developments in research on existing remote sensing foundation models from the perspectives of data
methods
and applications. Then
after analyzing the current situation’s limits
we proposed a novel general predictive foundation model. Finally
some essential research areas were highlighted
and past achievements were linked with the future possibilities of remote sensing foundation model.
Existing remote sensing foundation models were categorized into three groups according to the data types used (single-temporal/multitemporal) and the tasks involved (perceptual recognition/cognitive prediction): the foundation model of perceptual recognition based on single-temporal data
the foundation model of perceptual recognition based on multitemporal data
and the foundation model of cognitive prediction based on multitemporal data. According to the different self-supervised learning methods adopted
we divided the existing foundation models of perceptual recognition based on single-temporal data into those based on contrastive learning and those based on generative learning. According to the number of tasks
the foundation model of perceptual recognition based on multitemporal data was divided into a single-task-oriented foundation model and a multitask-oriented foundation model. According to different model architectures
the cognitive prediction foundation models based on multitemporal data were divided into transformer-based and graph network-based foundation models. In accordance with the aforementioned categorization
we described the current state of each type of remote sensing foundation models and summarized their data
methods
and application restrictions.
Based on the summary and analysis of the existing remote sensing foundation models
a novel general predictive foundation model assumption was proposed. The information pipeline for multidomain or temporal data input and multitime or spatial scale task output can be opened up by extracting stable and generalized time-series hyper-pixel features. This approach enabled the accurate cognitive prediction of the future state. Tens of millions of multiplatform
multitype
multimodal
and multitemporal data were included. By combining the benefits of the transformer model and the graph network
a new foundation model architecture was created
which increased the model’s capacity and enhanced generalization while predicting multitarget interactions in large remote sensing scenes over the long term. In terms of application
the general predictive foundation model can be applied to diverse cognitive prediction tasks with multiple spatial and time scales. Under this assumption
we proposed four exploratory directions: multidomain time series data representation
stable feature extraction
object-environment interaction modeling
and multitask interaction reasoning
aiming to provide a reference for researchers exploring remote sensing foundation models.
In general
foundation models with generalization ability are crucial to development of remote sensing intelligent interpretation. We provided an overview of current advances in this field by collating the current state of research on remote sensing foundation models. By analyzing the limitations of current remote sensing foundation models in terms of data
methods
and applications
we proposed a novel general predictive foundation model assumption and further clarified four exploratory directions that urgently need breakthroughs under this idea. The follow-up work will make specific and important technological breakthroughs in multidomain time series data representation
stable feature extraction
object-environment interaction modeling
and multitask interaction reasoning. We explored a general remote sensing foundation model integrating perception recognition and cognitive prediction into a single architecture.
Akiva P , Purri M and Leotta M . 2022 . Self-supervised material and texture representation learning for remote sensing tasks // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans : IEEE: 8193 - 8205 [ DOI: 10.1109/CVPR52688.2022.00803 http://dx.doi.org/10.1109/CVPR52688.2022.00803 ]
Ayush K , Uzkent B , Meng C L , Tanmay K , Burke M , Lobell D and Ermon S . 2021 . Geography-aware self-supervised learning // Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision . Montreal : IEEE: 10161 - 10170 [ DOI: 10.1109/ICCV48922.2021.01002 http://dx.doi.org/10.1109/ICCV48922.2021.01002 ]
Bi K F , Xie L X , Zhang H H , Chen X , Gu X T and Tian Q . 2022 . Pangu-weather: a 3D high-resolution model for fast and accurate global weather forecast . arXiv preprint arXiv : 2211 . 02556 [ DOI: 10.48550/arXiv.2211.02556 http://dx.doi.org/10.48550/arXiv.2211.02556 ]
Bourcier J , Floquet T , Dashyan G , Ceillier T , Alahari K and Chanussot J . 2022 . Self-supervised pretraining on satellite imagery: a case study on label-efficient vehicle detection . arXiv preprint arXiv : 2210 . 11815 [ DOI: 10.48550/arXiv.2210.11815 http://dx.doi.org/10.48550/arXiv.2210.11815 ]
Cha K , Seo J and Lee T . 2023 . A billion-scale foundation model for remote sensing images . arXiv preprint arXiv : 2304 . 05215 [ DOI: 10.48550/arXiv.2304.05215 http://dx.doi.org/10.48550/arXiv.2304.05215 ]
Chen K , Han T , Gong J C , Bai L , Ling F H , Luo J J , Chen X , Ma L M , Zhang T N , Su R , Ci Y Z , Li B , Yang X K and Ouyang W L . 2023 . FengWu: pushing the skillful global medium-range weather forecast beyond 10 days lead . arXiv preprint arXiv : 2304 . 02948 [ DOI: 10.48550/arXiv.2304.02948 http://dx.doi.org/10.48550/arXiv.2304.02948 ]
Chen T , Kornblith S , Norouzi M and Hinton G . 2020a . A simple framework for contrastive learning of visual representations // Proceedings of the 37th International Conference on Machine Learning . Virtual : JMLR.org: 1597 - 1607
Chen T , Kornblith S , Swersky K , Norouzi M and Hinton G . 2020b . Big self-supervised models are strong semi-supervised learners // Proceedings of the 34th International Conference on neural Information Processing Systems . Vancouver : Curran Associates Inc.: 22243 - 22255 [ DOI: 10.48550/arXiv.2006.10029 http://dx.doi.org/10.48550/arXiv.2006.10029 ]
Chen X L , Fan H Q , Girshick R and He K M . 2020c . Improved baselines with momentum contrastive learning . arXiv preprint arXiv : 2003 . 04297 [ DOI: 10.48550/arXiv.2003.04297 http://dx.doi.org/10.48550/arXiv.2003.04297 ]
Chen X , Xie S , He K . An empirical study of training self-supervised vision transformers [C ] // Proceedings of the IEEE/CVF international conference on computer vision . 2021 : 9640 - 9649 .
Cong Y Z , Khanna S , Meng C L , Liu P , Rozi E , He Y T , Burke M , Lobell D B and Ermon S . 2022 . SatMAE: pre-training transformers for temporal and multi-spectral satellite imagery // Proceedings of the 36th International Conference on Neural Information Processing Systems . New Orleans : Curran Associates Inc.: 197 - 211
Dosovitskiy A , Beyer L , Kolesnikov A , Weissenborn D , Zhai X H , Unterthiner T , Dehghani M , Minderer M , Heigold G , Gelly S , Uszkoreit J and Houlsby N . 2020 . An image is worth 16 × 16 words: transformers for image recognition at scale. arXiv preprint arXiv : 2010 . 11929 [ DOI: 10.48550/arXiv.2010.11929 http://dx.doi.org/10.48550/arXiv.2010.11929 ]
Gómez C , White J C and Wulder M A . 2016 . Optical remotely sensed time series data for land cover classification: a review . ISPRS Journal of Photogrammetry and Remote Sensing , 116 : 55 - 72 [ DOI: 10.1016/j.isprsjprs.2016.03.008 http://dx.doi.org/10.1016/j.isprsjprs.2016.03.008 ]
Guibas J , Mardani M , Li Z Y , Tao A , Anandkumar A and Catanzaro B . 2021 . Adaptive fourier neural operators: efficient token mixers for transformers . arXiv preprint arXiv : 2111 . 13587 [ DOI: 10.48550/arXiv.2111.13587 http://dx.doi.org/10.48550/arXiv.2111.13587 ]
He K M , Chen X L , Xie S N , Li Y H , Dollár P and Girshick R . 2022 . Masked autoencoders are scalable vision learners // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans : IEEE: 15979 - 15988 [ DOI: 10.1109/CVPR52688.2022.01553 http://dx.doi.org/10.1109/CVPR52688.2022.01553 ]
He K M , Fan H Q , Wu Y X , Xie S N and Girshick R . 2020 . Momentum contrast for unsupervised visual representation learning // Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle : IEEE: 9726 - 9735 [ DOI: 10.1109/CVPR42600.2020.00975 http://dx.doi.org/10.1109/CVPR42600.2020.00975 ]
Heidler K , Mou L C , Hu D , Jin P , Li G Y , Gan C , Wen J R and Zhu X X . 2023 . Self-supervised audiovisual representation learning for remote sensing data . International Journal of Applied Earth Observation and Geoinformation , 116 : 103130 [ DOI: 10.1016/j.jag.2022.103130 http://dx.doi.org/10.1016/j.jag.2022.103130 ]
Ienco D , Interdonato R , Gaetano R and Minh D H T . 2019 . Combining Sentinel-1 and Sentinel-2 Satellite Image Time Series for land cover mapping via a multi-source deep learning architecture . ISPRS Journal of Photogrammetry and Remote Sensing , 158 : 11 - 22 [ DOI: 10.1016/j.isprsjprs.2019.09.016 http://dx.doi.org/10.1016/j.isprsjprs.2019.09.016 ]
Jain P , Schoen-Phelan B and Ross R . 2021 . Multi-modal self-supervised representation learning for earth observation // 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS . Brussels : IEEE: 3241 - 3244 [ DOI: 10.1109/IGARSS47720 http://dx.doi.org/10.1109/IGARSS47720 .
2021 . 9553741 ]
Jain P , Schoen-Phelan B and Ross R . 2022 . Self-supervised learning for invariant representations from multi-spectral and SAR images . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 15 : 7797 - 7808 [ DOI: 10.1109/JSTARS.2022.3204888 http://dx.doi.org/10.1109/JSTARS.2022.3204888 ]
Jung H , Oh Y , Jeong S , Lee C and Jeon T . 2022 . Contrastive self-supervised learning with smoothed representation for remote sensing . IEEE Geoscience and Remote Sensing Letters , 19 : 8010105 [ DOI: 10.1109/LGRS.2021.3069799 http://dx.doi.org/10.1109/LGRS.2021.3069799 ]
Lam R , Sanchez-Gonzalez A , Willson M , Wirnsberger P , Fortunato M , Alet F , Ravuri S , Ewalds T , Eaton-Rosen Z , Hu W H , Merose A , Hoyer S , Holland G , Vinyals O , Stott J , Pritzel A , Mohamed S and Battaglia P . 2022 . GraphCast: learning skillful medium-range global weather forecasting . arXiv preprint arXiv : 2212 . 12794 [ DOI: 10.48550/arXiv.2212.12794 http://dx.doi.org/10.48550/arXiv.2212.12794 ]
Li W Y , Chen K Y , Chen H and Shi Z W . 2022a . Geographical knowledge-driven representation learning for remote sensing images . IEEE Transactions on Geoscience and Remote Sensing , 60 : 5405516 [ DOI: 10.1109/TGRS.2021.3115569 http://dx.doi.org/10.1109/TGRS.2021.3115569 ]
Li W Y , Chen K Y and Shi Z W . 2022b . Geographical supervision correction for remote sensing representation learning . IEEE Transactions on Geoscience and Remote Sensing , 60 : 5411520 [ DOI: 10.1109/TGRS.2022.3202499 http://dx.doi.org/10.1109/TGRS.2022.3202499 ]
Li Z , Sui Z W , Fu Q Y , Zheng J J and Bu T . 2023 . High-resolution remote sensing extraction of urban buildings based on morphological sequences and multi-source a priori information . National Remote Sensing Bulletin , 27 ( 4 ): 998 - 1008
李治 , 隋正伟 , 傅俏燕 , 郑琎琎 , 卜桐 . 2023 . 基于形态学序列和多源先验信息的城市建筑物高分遥感提取 . 遥感学报 , 27 ( 4 ): 998 - 1008 [ DOI: 10.11834/jrs.20221077 http://dx.doi.org/10.11834/jrs.20221077 ]
Mai G C , Lao N , He Y T , Song J M and Ermon S . 2023 . CSP: self-supervised contrastive spatial pre-training for geospatial-visual representations . // International Conference on Machine Learning . PMLR, 2023 : 23498 - 23515 [ DOI: 10.48550/arXiv.2305.01118 http://dx.doi.org/10.48550/arXiv.2305.01118 ]
Mall U , Hariharan B and Bala K . 2023 . Change-aware sampling and contrastive learning for satellite images // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver : IEEE: 5261 - 5270 [ DOI: 10.1109/CVPR52729.2023.00509 http://dx.doi.org/10.1109/CVPR52729.2023.00509 ]
Mañas O , Lacoste A , Giró-i-Nieto X , Vazquez D and Rodríguez P . 2021 . Seasonal contrast: unsupervised pre-training from uncurated remote sensing data // Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision . Montreal : IEEE: 9394 - 9403 [ DOI: 10.1109/ICCV48922.2021.00928 http://dx.doi.org/10.1109/ICCV48922.2021.00928 ]
Mendieta M , Han B R , Shi X J , Zhu Y and Chen C . 2023 . GFM: building geospatial foundation models via continual pretraining. arXiv preprint arXiv: 2302 . 04476
Muhtar D , Zhang X L , Xiao P F , Li Z S and Gu F . 2023 . CMID: a unified self-supervised learning framework for remote sensing image understanding . IEEE Transactions on Geoscience and Remote Sensing , 61 : 5607817 [ DOI: 10.1109/TGRS.2023.3268232 http://dx.doi.org/10.1109/TGRS.2023.3268232 ]
Pathak J , Subramanian S , Harrington P , Raja S , Chattopadhyay A , Mardani M , Kurth T , Hall D , Li Z Y , Azizzadenesheli K , Hassanzadeh P , Kashinath K and Anandkumar A . 2022 . FourCastNet: a global data-driven high-resolution weather model using adaptive fourier neural operators . arXiv preprint arXiv : 2202 . 11214 [ DOI: 10.48550/arXiv.2202.11214 http://dx.doi.org/10.48550/arXiv.2202.11214 ]
Patnala A , Stadtler S , Schultz M G and Gall J . 2023 . Generating views using atmospheric correction for contrastive self-supervised learning of multispectral images . IEEE Geoscience and Remote Sensing Letters , 20 : 2502305 [ DOI: 10.1109/LGRS.2023.3274493 http://dx.doi.org/10.1109/LGRS.2023.3274493 ]
Prexl J and Schmitt M . 2023 . Multi-modal multi-objective contrastive learning for Sentinel-1/2 imagery // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Vancouver : IEEE: 2136 - 2144 [ DOI: 10.1109/CVPRW59228.2023.00207 http://dx.doi.org/10.1109/CVPRW59228.2023.00207 ]
Reed C J , Gupta R , Li S F , Brockman S , Funk C , Clipp B , Keutzer K , Candido S , Uyttendaele M and Darrell T . 2023 . Scale-MAE: a scale-aware masked autoencoder for multiscale geospatial representation learning . arXiv preprint arXiv : 2212 . 14532 [ DOI: 10.48550/arXiv.2212.14532 http://dx.doi.org/10.48550/arXiv.2212.14532 ]
Shi X J , Chen Z R , Wang H , Yeung D Y , Wong W K and Woo W C . 2015 . Convolutional LSTM network: a machine learning approach for precipitation nowcasting // Proceedings of the 28th International Conference on Neural Information Processing Systems . Montreal : MIT Press: 802 - 810
Stewart A J , Lehmann N , Corley I A , Wang Y , Chang Y C , Braham N A A , Sehgal S , Robinson C and Banerjee A . 2023 . SSL4EO-L: datasets and foundation models for landsat imagery . arXiv preprint arXiv : 2306 . 09424 [ DOI: 10.48550/arXiv.2306.09424 http://dx.doi.org/10.48550/arXiv.2306.09424 ]
Stojnić V and Risojević V . 2021 . Self-supervised learning of remote sensing scene representations using contrastive multiview coding // Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Nashville : IEEE: 1182 - 1191 [ DOI: 10.1109/CVPRW53098.2021.00129 http://dx.doi.org/10.1109/CVPRW53098.2021.00129 ]
Sun X , Wang P J , Lu W X , Zhu Z C , Lu X N , He Q B , Li J X , Rong X E , Yang Z J , Chang H , He Q L , Yang G , Wang R P , Lu J W and Fu K . 2023 . RingMo: a remote sensing foundation model with masked image modeling . IEEE Transactions on Geoscience and Remote Sensing , 61 : 5612822 [ DOI: 10.1109/TGRS.2022.3194732 http://dx.doi.org/10.1109/TGRS.2022.3194732 ]
Tao C , Qi J , Zhang G , Zhu Q , Lu W P and Li H F . 2023 . TOV: the original vision model for optical remote sensing image understanding via self-supervised learning . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 16 : 4916 - 4930 [ DOI: 10.1109/JSTARS.2023.3271312 http://dx.doi.org/10.1109/JSTARS.2023.3271312 ]
Tian Y L , Krishnan D and Isola P . 2020 . Contrastive multiview coding // 16th European Conference on Computer Vision . Glasgow : Springer: 776 - 794 [ DOI: 10.1007/978-3-030-58621-8_45 http://dx.doi.org/10.1007/978-3-030-58621-8_45 ]
Tian Z Z , Zhang H W , Wang K , Liu S Q , Zou Q J , Zhao Z and Chen Y B . 2023 . Application of an improved CenterNet in remote sensing images object detection . National Remote Sensing Bulletin , 27 ( 12 ): 2706 - 2715
田壮壮 , 张恒伟 , 王坤 , 刘盛启 , 邹前进 , 赵镇 , 陈育斌 . 2023 . 改进CenterNet在遥感图像目标检测中的应用 . 遥感学报 , 27 ( 12 ): 2706 - 2715 [ DOI: 10.11834/jrs.20231638 http://dx.doi.org/10.11834/jrs.20231638 ]
Tseng G , Cartuyvels R , Zvonkov I , Purohit M , Rolnick D and Kerner H . 2024 . Lightweight, pre-trained transformers for remote sensing timeseries . arXiv preprint arXiv : 2304 . 14065 [ DOI: 10.48550/arXiv.2304.14065 http://dx.doi.org/10.48550/arXiv.2304.14065 ]
Vincenzi S , Porrello A , Buzzega P , Cipriano M , Fronte P , Cuccu R , Ippoliti C , Conte A and Calderara S . 2021 . The color out of space: learning self-supervised representations for earth observation imagery // 2020 25th International Conference on Pattern Recognition (ICPR) . Milan : IEEE: 3034 - 3041 [ DOI: 10.1109/ICPR48806.2021.9413112 http://dx.doi.org/10.1109/ICPR48806.2021.9413112 ]
Wang D , Zhang Q M , Xu Y F , Zhang J , Du B , Tao D C and Zhang L P . 2022a . Advancing plain vision transformer toward remote sensing foundation model . IEEE Transactions on Geoscience and Remote Sensing , 61 : 5607315 [ DOI: 10.1109/TGRS.2022.3222818 http://dx.doi.org/10.1109/TGRS.2022.3222818 ]
Wang W , Li X J and Wang X . ADC-CPANet:A Remote Sensing Image Classification Method Based on Local-Global Feature Fusion . National Remote Sensing Bulletin ,
王威 , 李希杰 , 王新 . ADC-CPANet: 一种局部—全局特征融合的遥感图像分类方法 . 遥感学报 [ DOI: 10.11834/jrs.20232658 http://dx.doi.org/10.11834/jrs.20232658 ]
Wang Y B , Wu H X , Zhang J J , Gao Z F , Wang J M , Yu P S and Long M S . 2022b . PredRNN: a recurrent neural network for spatiotemporal predictive learning . IEEE Transactions on Pattern Analysis and Machine Intelligence , 45 ( 2 ): 2208 - 2225 [ DOI: 10.1109/TPAMI.2022.3165153 http://dx.doi.org/10.1109/TPAMI.2022.3165153 ]
Wanyan X Y , Seneviratne S , Shen S C and Kirley M . 2023 . DINO-MC: self-supervised contrastive learning for remote sensing imagery with multi-sized local crops . arXiv preprint arXiv : 2303 . 06670 [ DOI: 10.48550/arXiv.2303.06670 http://dx.doi.org/10.48550/arXiv.2303.06670 ]
Ying C X , Cai T L , Luo S J , Zheng S X , Ke G L , He D , Shen Y M and Liu T Y . 2021 . Do transformers really perform bad for graph representation? . arXiv : 2106 . 05234 [ DOI: 10.48550/arXiv.2106.05234 http://dx.doi.org/10.48550/arXiv.2106.05234 ]
Yuan Y and Lin L . 2021 . Self-supervised pretraining of transformers for satellite image time series classification . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 14 : 474 - 487 [ DOI: 10.1109/JSTARS.2020.3036602 http://dx.doi.org/10.1109/JSTARS.2020.3036602 ]
Yuan Y , Lin L , Liu Q S , Hang R L and Zhou Z G . 2022 . SITS-Former: a pre-trained spatio-spectral-temporal representation model for Sentinel-2 time series classification . International Journal of Applied Earth Observation and Geoinformation , 106 : 102651 [ DOI: 10.1016/j.jag.2021.102651 http://dx.doi.org/10.1016/j.jag.2021.102651 ]
Zheng X C , Kellenberger B , Gong R , Hajnsek I and Tuia D . 2021 . Self-supervised pretraining and controlled augmentation improve rare wildlife recognition in UAV images // Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops . Montreal : IEEE: 732 - 741 [ DOI: 10.1109/ICCVW54120.2021.00087 http://dx.doi.org/10.1109/ICCVW54120.2021.00087 ]
相关文章
相关作者
相关机构
京公网安备11010802024621