摘要:Artemia, commonly known as brine shrimp, is a small crustacean plankton species with a worldwide distribution. It plays a crucial role in hypersaline aquatic ecosystems and has significant ecological, economic, and scientific value. Due to its high tolerance to extreme salinity, Artemia serves as an important biological model for environmental research. Additionally, it is widely used in aquaculture as a high-nutrient live feed for fish and crustaceans, further highlighting its economic importance. Artemia and its cysts often aggregate on the water surface, forming dense clusters known as Artemia slicks. These slicks are highly visible and can be efficiently detected using remote sensing technology. With advancements in satellite remote sensing, researchers have employed different satellite sensors at different spatial and spectral resolutions to extract Artemia slicks, analyze their temporal changes, and estimate their biomass. Remote sensing provides an effective, large-scale, and time-efficient approach for studying Artemia populations, making it an essential tool for ecological monitoring and resource management. This study systematically reviewed the principles of optical remote sensing in detecting Artemia and categorized the methodologies used for Artemia band extraction. The optical properties of Artemia were primarily influenced by the pigments present in its body, particularly carotenoids, which gave it a reddish or pinkish appearance. These pigments altered the spectral reflectance of Artemia slicks, distinguishing them from the surrounding water. The spectral characteristics of different types of Artemia varied slightly, but overall, their reflectance in the visible and near-infrared bands differed from other aquatic features, allowing for effective identification using multispectral and hyperspectral remote sensing methods. Understanding these spectral features was essential for developing accurate detection algorithms. The remote sensing-based extraction of Artemia slicks can be broadly classified into two main approaches: (1) spectral feature-based algorithms and (2) deep learning-based algorithms. Spectral feature-based methods relied on band ratios, spectral indices, and classification techniques to differentiate Artemia slicks from water bodies and other background features. These methods leveraged the unique spectral properties of Artemia to enhance detection accuracy and had been applied in previous studies. With the development of artificial intelligence, deep learning-based methods have gained prominence in remote sensing applications. U-Net deep learning models were used for Artemia slicks detection, demonstrating great performance. Deep learning algorithms automatically extracted spatial and spectral features from high-resolution imagery, making them more robust and adaptable to different water environments. These methods have significantly improved the accuracy of Artemia slicks extraction, particularly in complex aquatic environments where spectral similarities between Artemia and other waterborne materials. In addition to detection, remote sensing has been utilized to analyze the spatiotemporal dynamics of Artemia slicks. The distribution of Artemia slicks varies seasonally and annually due to environmental factors such as temperature, salinity, and food availability. By analyzing multi-temporal satellite images, researchers have observed significant variations in Artemia populations, providing valuable insights into their ecological patterns and responses to environmental changes. Furthermore, remote sensing has been successfully applied in biomass estimation of Artemia, which is critical for sustainable resource management. Biomass estimation models integrate spectral indices, empirical regression techniques, and machine learning algorithms to quantify Artemia abundance. In conclusion, remote sensing technology has greatly advanced the study of Artemia slicks, offering efficient methods for their detection, temporal monitoring, and biomass estimation. However, challenges remain in improving the accuracy and reliability of these methods. Future research should focus on integrating multi-source remote sensing data, including optical, thermal, and radar imagery, to enhance detection precision. The fusion of satellite, aerial, and Unmanned Aerial Vehicle (UAV)-based remote sensing can further improve spatial resolution and data consistency. Additionally, continued advancements in deep learning and artificial intelligence will refine automated detection techniques, enabling more accurate and scalable monitoring of Artemia populations. The integration of remote sensing with ecological modeling will provide deeper insights into the environmental drivers influencing Artemia distributions. As remote sensing technology continues to evolve, it will play an increasingly crucial role in Artemia research, facilitating sustainable resource management and conservation efforts.
关键词:brine shrimp (Artemia);brine shrimp slicks;Spectral characteristics;remote sensing;identification and extraction;biomass estimation
LOU Yifeng, HUANG Ke, YANG Gang, SUN Weiwei, SHAO Chunchen, LIU Weiwei, WANG Lihua, HU Jing
DOI:10.11834/jrs.20254113
摘要:Rice is undeniably one of the most crucial food crops globally, serving as the primary source of sustenance for nearly 50% of the world's population. Its significance extends beyond mere food provision, as the cultivation process demands substantial water resources, and flooded rice fields contribute significantly to methane emissions, thereby presenting formidable challenges to both food security and ecological equilibrium. In this context, precisely determining the distribution areas of rice cultivation emerges as a pivotal task for effective food security and ecological environment management.Traditional manual survey methods have long been constrained by their limited scope and labor-intensive nature. In contrast, remote sensing technology has emerged as a powerful alternative, enabling the efficient monitoring of large-scale and long-term rice cultivation. By capitalizing on imagery data procured from satellites or aerial platforms, remote sensing offers comprehensive coverage, facilitating in-depth and detailed monitoring of rice cultivation regions.To address the need for rapid and accurate rice distribution mapping, this research has introduced an innovative optical paddy rice index known as NOPRI. The construction of NOPRI is a meticulously designed process. Initially, leveraging the Google Earth Engine (GEE) platform, the Sentinel-2 Level 1C image products corresponding to the target area and year were acquired. Subsequently, cloud masking was executed to eliminate the interference of cloudy pixels, followed by the calculation and generation of the time series of the Normalized Difference Vegetation Index (NDVI) and the Modified Normalized Difference Water Index (MNDWI). These indices play a fundamental role in characterizing the growth and environmental conditions of rice. Harmonic analysis was then employed to reconstruct the NDVI and MNDWI time series, effectively enhancing the visualization and understanding of the periodic patterns inherent in rice growth. The coefficients derived from this time series harmonic analysis were exploited to quantitatively assess the unique time series characteristics of rice. Through a concise statistical analysis of a relatively small number of samples, the feature coefficients were extracted, refined, and combined. Notably, the characteristics of MNDWI during the critical rice flooding period were integrated into the index formula, endowing NOPRI with enhanced discriminatory power.This study achieved remarkable success in mapping the rice distribution in Region 1 in 2019, Region 2 in 2019, Region 3 in 2022, and Region 4 in 2018 by applying index threshold segmentation. To rigorously evaluate the accuracy of these maps, a comprehensive assessment was conducted using 7,001 validation sample points. The outcomes revealed that the overall accuracy of NOPRI surpassed 0.945, and the F1 score exceeded 0.907, providing strong evidence of the high reliability of the mapping results. In the evaluation of the mapping effect, a detailed visual comparison between the enlarged views of the rice distribution maps and the Sentinel-2 RGB images demonstrated a striking similarity. When contrasted with the Synthetic Aperture Radar-based Paddy Rice Index (SPRI) and existing rice datasets, NOPRI exhibited superior extraction accuracy and more precise mapping effects. However, it is important to note that the performance of NOPRI is susceptible to the quality of optical data. In research areas where the long-term quality of optical data is suboptimal, the extraction accuracy and mapping effect of the rice index may be compromised. Future research efforts should focus on devising strategies to mitigate the impact of data quality issues and further enhancing the robustness and applicability of NOPRI.
关键词:rice;Sentinel-2;time series;Time series harmonic analysis;paddy rice index
WU Hanfei, FENG Bin, LI Menghua, YANG Mengshi, ZHANG Zhen, TANG Bohui
DOI:10.11834/jrs.20254393
摘要:Objective Method We propose a novel deep clustering framework based on self-supervised contrastive learning to enhance clustering performance on unlabeled deformation time-series data. To overcome the limitations of traditional time-series data augmentation techniques, a rotational consistency-based data augmentation strategy is introduced. This strategy maintains morphological similarity by rotating the original time series at different angles, enabling the model to better capture invariance in time-series transformations. The method's clustering performance is evaluated against traditional K-means clustering, with key metrics such as clustering accuracy and normalized mutual information (NMI) used for comparison. The framework is further validated using deformation data extracted from the Sentinel-1 ascending orbit dataset, covering the Kafang tailings pond in Gejiu City, Yunnan Province, from January 2020 to October 2022.Results he proposed method outperformed traditional K-means clustering, achieving a 25.8% improvement in clustering accuracy and a 16.3% increase in normalized mutual information. Compared with the K-shape method, the proposed method shows better accuracy in capturing time series features and time series similarity measurement. The clustering analysis performed on the Sentinel-1 dataset successfully classified deformation time series into meaningful groups, revealing distinct deformation patterns and effectively identifying potential danger signals.Conclusion The integration of self-supervised contrastive learning with InSAR time-series analysis enhances the interpretability and classification of deformation patterns. The proposed method provides a robust and efficient tool for geohazard monitoring, urban infrastructure evaluation, and mining slope safety assessment. This approach has potential for broader applications in large-scale remote sensing data analysis.T The time-series InSAR (Interferometric Synthetic Aperture Radar) technique is widely recognized for its capability to monitor large-scale deformations, making it instrumental in applications such as geologic disaster monitoring, urban infrastructure safety assessment, and mining slope evaluation. This study aims to address the challenges in deciphering large-scale deformation time-series data obtained through InSAR (Interferometric Synthetic Aperture Radar). By leveraging self-supervised contrastive learning, we enhance the classification and clustering capabilities of InSAR time-series deformation data, which is critical for identifying geohazard signals and supporting infrastructure safety evaluation.
关键词:self-supervised learning;contrastive learning;Data enhancement;deep clustering;time series analysis;deformation clustering;feature representation;time series InSAR.
CAO Xuehuan, PENG Man, WAN Wenhui, WANG Biao, WANG Yexin, DI Kaichang, LI Lu
DOI:10.11834/jrs.20255015
摘要:The Martian surface is characterized by a significant presence of rocks, whose size distribution, spatial density, and morphological characteristics are critical factors in determining the safety of landing site selection for Mars missions. These factors also directly influence the path planning and motion control of rovers during exploration activities. Moreover, the spatial distribution of rocks holds substantial value for studying the geological evolution of landing sites. However, images captured by Mars rovers often exhibit blurred boundaries between rocks and the background, as well as ambiguous textural features of the rocks themselves. These issues complicate the task of distinguishing rocks from the surrounding terrain. Additionally, the scarcity of real Martian rock datasets further exacerbates these challenges, making it increasingly difficult to achieve automatic and accurate rock identification on the Martian surface. To address these challenges and achieve precise rock identification in Mars rover images, this paper proposes a novel model for automatic segmentation of Martian surface rocks based on a convolutional self-attention network. The model implements pixel-level segmentation of images, adopting an encoder-decoder network architecture. The encoder utilizes a Convolutional Neural Network (CNN) to extract image features, while a convolutional self-attention module is incorporated to enhance the model's ability to understand contextual information within the images and improve its rock detection performance. This module enables the model to focus more effectively on important spatial features across the image, thereby capturing dependencies and contextual relationships between different regions. Finally, the model employs skip connections to facilitate feature fusion, forming a U-shaped decoder network that produces pixel-wise classification outputs, enabling precise segmentation of the images. In this study, we present the Tianwen Mars Surface Image Dataset, a manually annotated dataset of Martian surface imagery captured by the Zhurong rover. We integrate multiple datasets, including the artificially simulated rock datasets Synmars and Simmars6k, as well as the MarsData-v2 dataset from Curiosity rover images, to conduct rigorous testing and validation of the model's performance. The model's performance is compared with several other advanced methods, including DeepLabv3+, UNet++, Segformer, and MarsNet, using evaluation metrics such as average pixel accuracy, recall, and Intersection over Union (IoU). The results demonstrate that our proposed model excels in rock extraction, achieving accuracy and recall rates exceeding 90% on the simulated datasets. On real datasets, the model outperforms other methods with the highest accuracy and recall, highlighting its superior performance in rock identification. The experimental comparisons indicate that the model has the potential for accurate and reliable detection of rocks in Mars rover images, which is crucial for both autonomous exploration and scientific research on Mars. The proposed convolutional self-attention module effectively combines the strengths of convolutional operations and Transformer architectures. It enhances the model's ability to extract contextual information while retaining the capacity to capture detailed features. This dual capability not only improves the model's segmentation accuracy in areas with dense rock clusters but also enhances its adaptability to regions with blurred boundaries. As a result, the model effectively addresses the challenges associated with accurately identifying surface rocks in the Martian environment, paving the way for more reliable and efficient Mars exploration missions.
SHAN Huilin, WANG Xingtao, LIU Wenxing, MENG Xiangyuan, WANG Zhihao, ZHANG Yinsheng
DOI:10.11834/jrs.20254321
摘要:Objective Remote sensing images, characterized by wide coverage, multi-spectral information, and complex spatial structures, serve as critical geospatial data for applications such as environmental monitoring and urban planning. However, traditional interpretation methods struggle to address the growing challenges posed by expanding image coverage, diverse object categories, and intricate feature interactions. While deep learning has shown promise in extracting multi-level semantic features for image segmentation, existing networks often suffer from suboptimal feature fusion and oversimplified feature representations. This paper aims to develop a novel segmentation algorithm that effectively integrates semantic and spatial information while addressing feature consistency and overfitting issues in small-scale remote sensing datasets, thereby improving segmentation accuracy and robustness.Method This study proposes a three-branch integrated network for high-resolution remote sensing image segmentation. First, two dedicated branches separately extract semantic and spatial features to maximize their distinct characteristics. A third consistency branch is introduced to learn semantic-spatial consistency features, mitigating oversimplification during fusion. Second, a multi-scale feature fusion module dynamically weights and integrates outputs from all three branches, enhancing feature representation adaptability. Additionally, to alleviate overfitting caused by limited training data, a spatial consistency-aware random cropping strategy is designed. This augmentation method generates diverse yet spatially coherent image patches by preserving key object structures during random cropping, ensuring effective model generalization.Result The proposed algorithm achieves state-of-the-art performance on the ISPRS Potsdam and Vaihingen datasets, with mean intersection over union (IoU) scores of 87.84% and 87.49%, respectively. It excels in segmenting complex scenarios, such as buildings with similar spectral profiles and non-opaque surfaces, by leveraging pixel-level spatial relationships and weighted multi-scale features. Edge segmentation accuracy is notably improved, producing smooth boundaries and reducing classification errors. However, inter-branch feature interaction remains limited, as evidenced by occasional inconsistencies in regions with overlapping spectral signatures. While the model demonstrates robustness to dataset scale, computational efficiency requires further optimization for real-time applications.Conclusion This work presents a three-branch network architecture that effectively addresses feature fusion challenges in remote sensing image segmentation. By decoupling semantic and spatial feature extraction while enforcing consistency constraints, the algorithm achieves significant accuracy improvements over conventional methods. The spatial consistency-aware data augmentation further enhances model generalizability. Current limitations lie in insufficient cross-branch communication and computational overhead. Future research will focus on designing lightweight interactive modules to strengthen feature exchange between branches and exploring transformer-based architectures to capture long-range dependencies. These advancements could enable higher precision in large-scale remote sensing applications, such as disaster assessment and land-use monitoring.
摘要:High-spatial-resolution hyperspectral remote sensed images can provide abundant spatial and spectral information at the same time, which is extremely important for practical applications such as precision agriculture, environmental monitoring, target detection and so on, and is one of the long-term goals in the field of remote sensing. Considering that the high-spatial resolution and high-spectral resolution are two imaging indexes mutually restricted from each other, it is still challenging to obtain the high-spatial-resolution hyperspectral images directly using the existing imaging technology, which limits its practical applicability. As one of the important technical means to reconstruct the high-spatial-resolution hyperspectral image, computational imaging can take the low-spatial-resolution hyperspectral image at the same time and over the same scene as a spectral priori, and fuse it with the spatial information provided by the high-spatial-resolution multispectral image based on the imaging model. It can also take the image-pair library or the spectral library as the priori information, and then reconstruct the high-spatial-resolution hyperspectral image by spectral super-resolution through spectral mapping. Here firstly, facing the above different ways of computational imaging for high-spatial-resolution hyperspectral images, a unified computational imaging model for high-spatial-resolution hyperspectral images based on prior information is constructed in this study. Then, according to the different sources of prior information, this paper summarizes the developing process and the related representative methods from the fusion of low-spatial-resolution hyperspectral and high-spatial-resolution multispectral images, to the image-pair learning based spectral super-resolution for high-spatial-resolution hyperspectral images, and to the latest spectral library learning based spectral super-resolution for high-spatial-resolution hyperspectral images. Besides, the basic ideas, advantages and limitations of the existing algorithms are systematically analyzed. And finally, three possible future trends including cross-domain adaptation, multi-library alignment, and hardware implementation are analyzed and discussed in the context of the future research direction of computational imaging for the high-spatial-resolution hyperspectral image. The results show that the computational imaging of high-spatial-resolution hyperspectral remote sensed images is one of the effective ways to break through the physical limitations of the remote sensed imaging system. Incorporating fusion and spectral super-resolution into a unified framework is conducive to systematically combing different sources of prior information, leading to more targeted high-precision and high-stability reconstruction. This study provides a unified framework and technical means for computational imaging of high-spatial-resolution hyperspectral remote sensed images, clarifies the future development direction of remote sensed image fusion and spectral super-resolution, and is expected to furtherly improve the ability of fine structure detection and fine spectral discrimination, thus providing technical support for the subsequent high-precision and high-reliability spectral target detection, object classification and other application tasks.
摘要:Non-exposed spaces, such as indoor environments, underground utility tunnels, and natural caves, are partially enclosed areas that have gained increasing attention as urbanization progresses. This growing demand for exploration has significant implications. Investigating these spaces and acquiring spatial information within them can support the development of new infrastructure and facilitate digital transformation. However, exploring non-exposed spaces presents several challenges due to their complex structures, signal isolation from external sources, and degraded conditions. In such spaces, external positioning signals, such as GNSS, are often unavailable, complicating the reliance on these signals for localization. Additionally, many non-exposed spaces suffer from degraded environmental conditions, which further hinder self-localization. The intricate internal structures of these spaces also pose safety risks for human entry. The advancement of unmanned systems technology presents a promising solution to these challenges. To address these issues, we designed an autonomous aerial unmanned system equipped with panoramic LiDAR, providing a wide field of view, and integrated with modules for localization, mapping, planning, and control, enabling autonomous flight in uncharted spaces. The system consists of five main components: sensor input, localization and mapping, planning, control, and the unmanned system itself. Sensor data from LiDAR and IMU are utilized for state estimation and real-time mapping. The system generates an occupancy grid map for trajectory planning, followed by optimization. Commands are then sent to the flight controller, which integrates manual and planned inputs to maintain stable flight. The system’s pose is monitored through Mavros, ensuring autonomous flight by controlling the motors via ESCs. Additionally, we propose a method for autonomous exploration of non-exposed spaces that involves point cloud mapping based on manually or autonomously assigned target points on a pre-established map. To validate the proposed autonomous aerial unmanned system and exploration method, we conducted experimental verifications in both simulated and real-world scenarios. We first selected a typical indoor scenario, “Indoor1,” within the XTDrone simulation environment under GNSS-denied conditions. The simulation utilized the Iris drone, supported by PX4 firmware with PX4 software-in-the-loop communication, and an Intel RealSense D455 simulation module for capturing visible and depth images. The simulation results demonstrated a detection efficiency of 23.94 m³/s using the proposed exploration method. Subsequently, real-world experiments were conducted in a section of an underground parking lot at Wuhan University’s Xinghu Experimental Building. The autonomous aerial unmanned system demonstrated stable flight, achieving a detection efficiency of 53.94 m³/s in complex environments, such as corridors, pipelines, and rooms. The experimental results confirm the feasibility of the proposed method. Real-world experiments achieved a detection efficiency exceeding 50% m³/s, validating the system's capability to efficiently explore non-exposed spaces. This demonstrates the significant potential of the system for future applications. Further research will focus on viewpoint generation based on specific targets to enhance the system's intelligence, enabling intelligent exploration of non-exposed spaces. This will improve the system’s autonomy and adaptability in complex environments.
XIAO Qing, WU Xiaodan, WEN Jianguang, BIAN Zunjian, LIN Xinwen, YOU Dongqin, YIN Gaofei
DOI:10.11834/jrs.20254466
摘要:Validation, inversion, and scale problems have been listed as the three major scientific problems in quantitative remote sensing. Validation serves as a crucial foundation for reflecting and revealing the errors in remote sensing algorithms and products. It is also an important guarantee for the continuous improvement of remote sensing product quality. After nearly 40 years of development, validation has received widespread attention in the international remote sensing community. Thus far, the theory and methods of validation have become relatively mature, and an increasing number of practical validation works have been conducted. This scenario has played a wide role in clarifying the error distribution of remote sensing products, thereby iteratively improving the quality of remote sensing products and promoting the application benefits of remote sensing products. However, given the development of remote sensing science and technology, the connotation of validation should not be limited to assessing the accuracy of remote sensing products. The in-depth expansion of various macro and micro applications related to geography objectively promotes the proactive and systematic global analysis of various uncertainties in the entire process of remote sensing information from data acquisition to application from the perspective of remote sensing science and technology disciplines. As a result, the technological conditions and advantages of the era are linked, and the iterative space observation capabilities are improved. With 40 years of academic accumulation, particularly the development of remote sensing product validation technology, the theory, method, and technology of validation have evolved into a comprehensive system from traditional comparative analysis based on statistics to simulation validation grounded in physical models. At present, we have the validation capability for the whole chain of remote sensing, including quantitative remote sensing mechanism model validation, satellite imaging data calibration, remote sensing data product assessment, application effect evaluation, and even remote sensing observation theory and methods. However, remote sensing information has multidimensional characteristics of time, space, spectrum, and events. In previous research, validation techniques and methods were developed for various stages, including primary data product processing, model/algorithm evaluation, production, and application. However, these stages were merely linked through simplistic, rigid input-output relationships. The correlations and inheritance of uncertainties across these stages were not explored. This rigid model overlooked the intrinsic connections and mutual influences among various stages. Thus, validation was often limited to individual stages rather than addressing problems systematically as a whole. Merely “evaluating” the accuracy and uncertainty of remote sensing products is far from sufficient because the ultimate goal of validation is to enhance further the quality of remote sensing products. Systematically reorganizing the connotation, methodology, and output of validation, as well as forming a working mechanism for interdisciplinary cooperation within and across disciplines, is necessary to enhance remote sensing spatial observation capabilities systematically through validation. This study provides a new interpretation of the concept and connotation of validation, analyzes and summarizes the current status of methods and technological development, and examines the key challenges that urgently need to be overcome in validation at present. Finally, this study provides a view of the specific ideas and development prospects of validation in the future.
LU Dengsheng, JIANG Xiandie, LI Yunhe, WANG Ruoqi, LI Guiying
DOI:10.11834/jrs.20255022
摘要:Forests as the largest carbon sink in the terrestrial ecosystems play important roles in mitigating climate change and maintaining ecological balance, thus, it is required to accurately map forest biomass distribution at timely manner. Remote sensing-based biomass estimation has obtained great attention in the past three decades, in particular, LiDAR due to its capability of capturing three-dimensional structure of forest stands has become an important data source for forest biomass estimation. LiDAR data can be acquired from different platforms such as close-ground, airborne and spaceborne, thus, they are used for biomass estimation at different scales such as individual trees, forest stands and landscapes. Many studies using LiDAR data for forest biomass estimation have been conducted, but no comprehensive review has been made so far. Therefore, this paper attempts to provide an overview of current situations of using LiDAR technologies for forest biomass estimation and discuss the challenges and potential solutions to improve biomass modeling performance at different scales. The research situations and existing problems on the biomass estimation at individual tree, plot, and landscape scales based on LiDAR data from different platforms (e.g., close-ground, airborne and spaceborne) were first described and the combination of LiDAR and other data sources such as optical, microwave radar, and auxiliary data for improvement of forest biomass estimation were then summarized and discussed; Different modeling methods such as regression, machine learning, deep learning, and hybrid methods were overviewed and the potential solutions to improve modeling accuracy through stratification were discussed; The potential factors causing biomass estimation uncertainty, the methods for examining and identifying uncertainty factors were described and then potential strategies to optimize the modeling procedure were discussed; The model transferability at time and space scales and the importance and challenge of constructing a universal forest biomass estimation model were then discussed. This paper highlighted the unique characteristics of LiDAR data from different platforms and indicated the necessity of incorporating LiDAR with other remotely sensed data for improving forest biomass estimation. This paper also indicated the importance of developing an optimized modeling procedure through examining modeling uncertainty and the values of developing a universal biomass estimation model through combination of physically based models and machine learning methods. This paper provided researchers a better understanding of the current situations of LiDAR technologies in forest biomass estimation research, and new insights for better employing relevant LiDAR data for improving forest biomass estimation at different scales.
MENG Yu, ZHANG Zheng, XI Zhihao, CHEN Jingbo, DENG Ligao, DENG Yupeng, KONG Yunlong
DOI:10.11834/jrs.20254309
摘要:Aerospace information intelligent interpretation refers to the utilization of artificial intelligence technology to intelligently process multi-source remote sensing data acquired from space-based platforms such as satellites and space stations, or from unmanned aerial vehicles and floatation vehicles, in order to extract key information from them and to realize highly efficient and automated analysis as well as application. With the rapid advancement of deep learning, data-driven intelligent interpretation models have become the mainstream approach. These models leverage large-scale, high-quality annotated datasets and sophisticated neural network architectures to enhance interpretation performance. However, despite their success, such methods still face several challenges in practical applications. For instance, they heavily rely on extensive annotated datasets, leading to high data acquisition and manual labeling costs. Additionally, these models often exhibit limited generalization capabilities, making them less adaptable to diverse remote sensing environments and varying data distributions. Furthermore, traditional deep learning models typically lack interpretability, as their feature-fitting processes based on statistical correlations are susceptible to confounding factors, reducing reliability and trustworthiness in real-world decision-making. To address these challenges, causality-inspired intelligent interpretation methods integrate causal reasoning with deep learning, incorporating causal relationship modeling into data analysis. Unlike conventional data-driven methods, causality-inspired intelligent interpretation emphasizes not only statistical correlations among variables but also the modeling of causal relationships. This enables more rational inference and decision-making in complex aerospace data environments, thereby improving both interpretability and reliability. Consequently, causality-inspired intelligent interpretation is considered to be an essential development direction for aerospace information intelligent interpretation in the future, and it is promising to become a new interpretation paradigm. This review focuses on integrating causal theory into aerospace information interpretation models. First, it explores current trends and research directions in aerospace information interpretation from three perspectives: correlation analysis, statistical modeling, and causal cognition. Based on the principles of causality—association, intervention, and counterfactual reasoning—this review constructs a “ladder of causation” framework to illustrate the role of causal reasoning in aerospace information analysis. Next, the study examines causal discovery and causal effect estimation methods tailored to the spatiotemporal characteristics of aerospace data. It also investigates causal representation learning in deep neural networks to assess how causal reasoning can enhance the accuracy and robustness of intelligent interpretation. Subsequently, three primary approaches to constructing causality-inspired intelligent interpretation models are discussed: (1) interpretation based on causal graphical models, (2) interpretation using counterfactual reasoning, and (3) interpretation centered on feature-level causal interventions. By embedding causal relationships into intelligent interpretation models, these methods improve model generalization and explainability, offering scientifically grounded solutions for aerospace data analysis. Finally, this review introduces typical applications of causality-inspired intelligent interpretation in aerospace observation environments, and summarizes the ideas of combining causal reasoning with spatiotemporal data from earth observation, causal model with intelligent interpretation model, aiming to provide valuable insights and references for future research in this field.
摘要:Automated interpretation of Synthetic Aperture Radar (SAR) images is one of the important development directions in the application of SAR technology, the core of which lies in how to efficiently extract target information from complex SAR images and realize automatic recognition. SAR target recognition methods are mainly classified into two main categories: traditional Machine Learning (ML) methods and Deep Learning (DL) methods. Traditional methods usually rely on hand-designed features extracted from Electromagnetic Scattering Features (ESF) that have clear physical meaning and high interpretability. For example, features such as the ESF, Radar Cross Section (RCS), and polarization characteristics of a target can directly reflect the target’s geometric structure, material properties, and its interaction mechanism with electromagnetic waves. These features show strong stability in target classification and identification tasks, especially in complex environments or under low Signal Noise Ratio (SNR) conditions. However, the limitations of traditional methods are that the feature extraction process is often complex and computationally inefficient, while relying on the a priori knowledge of domain experts, which makes it difficult to adapt to large-scale data processing and diverse target recognition needs.In contrast, DL methods are able to automatically extract high-dimensional features from SAR images through an end-to-end learning approach, avoiding the tedious process of manually designing features. D L methods usually outperform traditional methods in terms of target recognition accuracy, especially when dealing with SAR images with high resolution and complex backgrounds. However, DL methods also have obvious shortcomings, such as the poor interpretability of the model, which makes it difficult to explain its decision-making process; at the same time, the generalization performance of Neural Network Features (NNF) is often limited by the quality and diversity of the training data, and performance degradation may occur in the face of unseen targets or scenes.In order to overcome the limitations of traditional ML methods and DL methods, researchers in recent years have proposed a DL method that fuses ESF and NNF. This fusion method aims to combine the advantages of both. On the one hand, the physical interpretability and stability of electromagnetic scattering features are utilized to enhance the model’s understanding of the target’s intrinsic attributes. On the other hand, the high-dimensional features extracted by neural networks are used to enhance the model’s recognition accuracy and adaptability. For example, in the recognition task of targets such as vehicles, aircrafts and ships, the fusion method significantly improves the recognition accuracy by combining the ESF of the targets (e.g., strong scattering point distribution, polarization response) with the NNF, while improving the interpretability of the model to some extent.The paper discusses the research results of target recognition methods based on the fusion of ESF and NNF, details the application of this idea of fusing ESF for target recognition of vehicles, aircraft and ships, and looks forward to and summarizes the future development trends of target recognition and detection research.
DU Peijun, MU Haowei, GUO Shanchuan, CHEN Yu, ZHANG Xingang, TANG Pengfei
DOI:10.11834/jrs.20254398
摘要:Ensemble learning is a machine learning paradigm based on the idea of cooperative complementarity, which overcomes the limitations of individual learners and enhances overall decision-making performance through the effective combination of multiple learners. With the rapid development of remote sensing and artificial intelligence technologies, the demand for transforming remote sensing data into geoscientific knowledge has been increasing, driving the evolution of remote sensing ensemble learning towards a data-model-knowledge jointly driven paradigm. Based on the analysis of research progress both domestically and internationally, this paper summarizes the advancements of ensemble learning in remote sensing target recognition, land cover classification, multi-temporal change detection and time series remote sensing data analysis, surface parameter inversion, integration of remote sensing and social sensing data, and the integration of mechanisms and learning. Finally, it discusses four frontiers and development directions: the integration of large remote sensing models with interpretability, the composition and measurement of ensemble diversity, new ensemble strategies, and the optimization of ensemble modes to meet geoscientific needs.
关键词:remote sensing;ensemble learning;image classification;change detection;Mechanism and learning ensemble
摘要:Inland surface waters, a critical freshwater resource, have undergone significant changes due to the combined impacts of climate change and human activities, underscoring the need for effective detection and monitoring of their distribution and spatiotemporal dynamics. Satellite remote sensing technology, with its broad spatial coverage, long-term historical data availability, and cost-effectiveness, has emerged as a key tool for tracking changes in inland water resources. Water bodies exhibit distinct characteristics in remote sensing images, enabling detailed quantitative analysis of their extent and changes over time through the application of appropriate water extraction algorithms on multi-temporal remote sensing imagery from various sources.This paper provides a comprehensive review of current research on the detection and monitoring of inland surface waters, focusing on four key areas: commonly used remote sensing data sources, state-of-the-art water extraction methods, insightful remote sensing applications, and the associated challenges and future directions. Both optical and microwave remote sensing data offer unique advantages and play crucial roles, with the integration of data from different sensors showing considerable promise. Traditional threshold-based methods identify water bodies by setting specific spectral thresholds, while machine learning classification algorithms leverage a combination of spectral, textural, spatial, and geometric features for water body extraction. Other approaches also perform well in specific scenarios. In recent decades, significant progress has been made in the use of satellite remote sensing to monitor the extent of inland surface waters, leading to the development of various large-scale and long-term rasterized, vectorized, and digitized surface water datasets, as well as new insights into the spatial and temporal dynamics of the global surface water bodies and their driving forces. Finally, this paper suggests potential solutions to challenges including the trade-off between spatial and temporal resolution and monitoring in conditions of contamination and obscured by vegetation, while exploring the future prospects and challenges of water detection in a new era of remote sensing big data. This paper seeks to provide a comprehensive reference and practical guidance for researchers, practitioners, and decision-makers interested in harnessing remote sensing technologies for the study and advanced surveillance of inland water bodies.
LI Jinyuan, KE Yinghai, MIN Yukui, SHA Jinghan, ZHANG Mengyao, HAN Xiaoran
DOI:10.11834/jrs.20254350
摘要:Accurate monitoring of salt marsh vegetation phenology is essential for studying the carbon cycle in "blue carbon" ecosystems. High spatio-temporal resolution satellite remote sensing technology enables detailed monitoring of vegetation phenology; however, "salt-and-pepper" noise is an inevitable challenge. This study focuses on the Yellow River Estuary Wetland and combines high-resolution remote sensing data with object-oriented methods to investigate coastal salt marsh phenology. First, multi-scale segmentation is applied to Jilin-1 images to extract salt marsh vegetation objects, which are then used for basic units for phenological parameter extraction. Based on time-series NDVI from PlanetScope images, S-G filtering, the Double-Logistic model and dynamic thresholding methods are then used to extract phenological parameters, including the start date of the growing season (SOS), end date (EOS), and length of the growing season (LOS). The results are assessed from three aspects: (1) the fitting accuracy of the time-series NDVI, (2) the spatial heterogeneity of the extracted phenological parameters, and (3) consistency with observations from a phenological camera. The findings indicate: (1) Compared to pixel-based approaches, the object-based time-series NDVI fitting achieves lower root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Specifically, the area with RMSE < 0.05, MAPE < 15%, and MAE < 0.035 increased by 11.46%, 12.93%, and 10.72%, respectively, demonstrating improved fitting accuracy at the object scale. (2) the extracted phenological parameters are similar for both object- and pixel-based approaches, with both capturing the spatial heterogeneity of salt marsh vegetation phenology. However, object-based parameters are spatially smoother, mitigating the small-scale variability observed in pixel-level phenological parameters. Analysis of spatial heterogeneity through semi-variogram functions reveals that the nugget () and partial sill () values for object-based parameters are significantly lower than those for pixel-based parameters. (3) Object-based phenological parameters show a high degree of consistency with those obtained from the phenological camera (with SOS matching exactly, and EOS and LOS differing by only one day), whereas pixel-level parameters exhibit more variations. This study demonstrates that object-oriented image analysis effectively reduces salt-and-pepper noise in high-resolution remote sensing images and shows strong potential for high-resolution phenology extraction in salt marsh wetlands.
Xie Hang, Li Xiuzhong, Xu Ying, Hu Weiping, Zhu Dong
DOI:10.11834/jrs.20254320
摘要:Objective Significant wave height (SWH) is a crucial parameter in the study of ocean wave fields. Since the launch of GEOSAT in 1985, satellite altimeters have been providing global observations of SWH. However, the limitations of individual satellite orbit intervals and operation cycles mean that only the fusion of multi-satellite observation data can meet the needs of mesoscale wave field research. Currently, methods for fusing SWH data at specific times include the spatiotemporal inverse distance weighting method and the time-weighted averaging method used in CMEMS L4 products. However, the spatiotemporal inverse distance weighting method does not objectively allocate weights for temporal and spatial distances. Additionally, the time-weighted averaging method employed in CMEMS L4 products cannot meet the fusion requirements for dense grids at high spatial resolutions.Methods This study utilized along-track 1Hz significant wave height data from nine satellite observations. Initially, the multi-source satellite observation data were matched with buoy data from the National Data Buoy Center (NDBC). Calibration was then performed using least squares regression analysis. A filtering method based on empirical mode decomposition(EMD) was applied to the calibrated SWH data. By exploring the equivalence between spatial and temporal distances at the same error level, the influence of spatial and temporal dimensions on the fusion results was unified, leading to the development of a novel spatiotemporal fusion method for significant wave height.Results Through a systematic study of the fusion process, we found that the spatial window size affects the proportion of blank grids. The error of the fusion results decreases initially and then increases with the enlargement of the temporal window. Compared to the CMEMS L4 products and traditional spatiotemporal weighting fusion methods, our fusion method demonstrates good consistency in global spatiotemporal distribution characteristics and reduces errors by 4.1 cm and 1.55 cm, respectively. Additionally, the proposed fusion method exhibits high-resolution fusion capabilities.Conclusion This study proposes a new method for significant wave height fusion and conducts a detailed investigation into parameter selection within this method. The improvements not only demonstrate the potential of the new method in enhancing the accuracy of SWH fusion products but also increase the practicality and precision of SWH products in various ocean applications. Furthermore, this study provides valuable reference points for future applications in ocean wave observation and prediction. The research indicates that under the fusion of 4-9 satellites, selecting a 24-hour temporal window, and with 3 satellites, selecting a 30-hour temporal window, along with the optimal spatial window provided in this study, yields the highest accuracy in the final fusion results. However, there are areas that need improvement. Future research should incorporate more detailed and accurate sea ice datasets, considering factors such as sea ice concentration, ice edge, and ice type, to better account for the impact on grid masking in high-latitude regions.
Li Jingzhu, Liu Xiaoyu, Hao Lingxiang, Wu Guoxuan, Sun Zheng, Gong Min, Li Ya
DOI:10.11834/jrs.20254523
摘要:ObjectiveTarget relationship analysis is to gain a deeper understanding of the target based on the detection task. In recent years, the target relationship analysis based on computer vision has gradually attracted the attention of researchers in related fields, while the relationship analysis algorithm in remote sensing field started late. One of the reasons is the lack of relationship datasets for the remote sensing field. Therefore, this article aims to fill the gap in the review of association datasets, to conduct a detailed review and analysis of relationship datasets and target relationship analysis algorithms in the fields of computer vision and remote sensing, and to provide new ideas for the research and development of target relationship analysis in the remote sensing field.MethodIn terms of relationship datasets, this article reviews and outlines 15 published image and video visual association datasets and 4 remote sensing image association datasets in the fields of computer vision and remote sensing, and systematically introduces their characteristics, data scale, association categories, number of triples, etc., and summarizes the limitations of existing remote sensing relationship datasets such as inconsistent annotation formats and obvious long-tailed small groups. In terms of target relationship analysis algorithms, this article divides the existing algorithms into relationship analysis algorithms based on conditional random fields and relationship analysis algorithms based on deep learning, and summarizes the advantages and disadvantages of each type of algorithm, and compares the performance of the most advanced algorithms based on the evaluation metric commonly used in target relationship analysis tasks. Finally, the factors that improve the performance of current remote sensing target relationship analysis algorithms are summarized, such as complex semantic relationships, dynamic time-varying scenes, etc.ResultIn view of the limitations of current relationship datasets and target relationship analysis algorithms, this article proposes possible future research directions respectively. Regarding association datasets, this article proposes a concept of building a remote sensing relationship datasets for time series data and a annotation specifications for remote sensing target relationships, and preliminarily develops a set of association annotation tools for remote sensing images, which provides optimization suggestions for the construction direction of future relationship datasets. Regarding target relationship analysis algorithms, this article proposes innovative methods such as association dynamic hierarchical modeling and dynamic scene graphs, in order to provide ideas for researchers in related fields.ConclusionTarget relationship analysis is an important step to move from target perception to cognition. However, the research on target relationship analysis in the field of remote sensing has just started, and there is a lack of discussion of remote sensing target relationship datasets. To address this problem, this article systematically sorts out the existing relationship datasets in the current computer vision and remote sensing fields, and summarizes the limitations of remote sensing relationship datasets. At the same time, this article summarizes the research status of target association algorithms, and demonstrates advantages and disadvantages of existing algorithms according to the algorithm type. Finally, this article proposes a series of possible research ideas for the development of target relationship analysis in the field of remote sensing. In terms of sample accumulation, this article proposes the idea of constructing a remote sensing time series relationship dataset, and in order to unify the annotation format, a set of sample annotation and visualization tools is developed. In terms of algorithm innovation, this article proposes the idea of a dynamic hierarchical relationship reasoning method, which provides new ideas for the future development of target relationship analysis algorithms.
摘要:Rotated object detection is a fundamental task in the field of remote sensing, essential for accurately identifying and localizing objects with arbitrary orientations in aerial or satellite imagery. Despite its importance, existing rotated object detection algorithms face significant challenges that hinder their performance and practical applicability. These challenges include large variations in target scales, dense distributions of targets leading to occlusions, confusion with similar objects due to limited distinguishing features, complex background interference, and a general lack of image details owing to factors like low resolution or adverse imaging conditions. These issues often come with high computational burdens and limitations in detection accuracy. To address the challenges of rotated object detection in remote sensing images, this paper improves upon the state-of-the-art YOLOv9 detector by developing a new high-performance rotated object detector that balances detection accuracy and inference speed. Firstly, an auxiliary data enhancement module for low-light remote sensing images based on the Retinex algorithm is introduced to improve problems related to low light, noise, blurring, and lack of contrast. Secondly, a decoupled angle prediction head is designed to enable the algorithm to perceive the orientation of remote sensing targets. Next, a Kalman filter-based Intersection over Union (KFIoU) loss is incorporated into the model to address the periodicity issue caused by rotating object representation, using Distribution Focal Loss (DFL) to predict angles and solve the angle representation problem for nearly square objects within Gaussian modeling methods. Additionally, a dynamic label assignment strategy for rotated object detection is proposed, which takes into account both IoU and score values during the assignment process and incorporates a distance penalty term, thereby constructing a sample space that better reflects the characteristics of the targets. Finally, during non-maximum suppression, we use probabilistic Intersection over Union and merging rate (ProbIoU) based on the Hellinger distance to remove some redundant candidate boxes, thereby reducing computational burden. We evaluated the proposed RSO-YOLO detector on the publicly available DIOR-R dataset. Experimental results show that our method achieved a mean Average Precision (mAP) of 81.1% on this dataset, surpassing several typical rotated object detection methods and achieving top rank in detection accuracy. Notably, the introduction of the auxiliary data enhancement module contributed to a 1.5% increase in mAP, demonstrating its effectiveness in enhancing detection performance under low-light and poor-quality imaging conditions. Moreover, we verified the model's generalization ability on the DOTA dataset, where the results were also outstanding. The model maintained real-time detection capabilities, highlighting its practical applicability for time-sensitive remote sensing tasks. In summary, the RSO-YOLO detector demonstrates superior performance compared to existing methods by integrating an auxiliary data enhancement module, utilizing a decoupled angle prediction head, improved loss functions, a dynamic label assignment strategy, and an efficient non-maximum suppression method. Experimental results confirm that RSO-YOLO not only achieves higher accuracy but also operates efficiently, making it particularly advantageous for practical applications in remote sensing where both speed and precision are critical. Future work may involve further optimization of the proposed methods and exploring their application to other types of imagery and detection tasks.
Zhang Linlin, Wu Jiahao, Meng Qingyan, Wang Xuemiao, Du Hongyu, Pan Jing
DOI:10.11834/jrs.20254184
摘要:Urban areas, as the primary region for ecological civilization construction, have seen an increasing focus on improving their ecological environment. Green space, as a crucial component of urban ecosystems, holds significant importance in enhancing urban environmental quality, promoting biodiversity, and improving the overall well-being of urban residents. Conducting evaluations of the environmental benefits of green spaces can provide valuable insights for fine-grained urban environmental management. Under the rapid urbanization in China, this study conducted a comprehensive assessment of the thermal environmental benefits produced by typical urban green spaces in Hainan Province using advanced remote sensing techniques based on Gaofen-6 and Landsat satellite data. First, vegetation in the study area is extracted and classified into three main types—trees, shrubs, and grasslands—using multiscale segmentation and random forest algorithms. Various landscape pattern indices are computed, and multi-regional surface temperatures are inverted to analyze the impact of urban green landscapes on the urban heat environment. By employing Pearson correlation analysis and geographical detector methods, the study investigates the characteristics of the drivers linking green landscape patterns with surface temperatures. Finally, utilizing a hierarchical analysis method based on selected factor indicators, a Green Space Environmental Benefit Index (GEBI) is established to quantitatively assess the thermal environmental benefits generated by green spaces in Haikou and Sanya cities. The research findings reveal distinctive features of green spaces in Haikou and Sanya cities. The results indicate that: 1) Haikou city is predominantly characterized by shrub vegetation, while Sanya city is dominated by grassland and shrub vegetation; 2) The cooling effect of green spaces is correlated with factors such as green area size and diversity, as well as landscape patch shape and density. Green spaces exhibit poorer cooling effects when fragmented and irregularly shaped; 3) The GEBI distributions in the study areas showed alternating high-value and low-value regions. The GEBI hotspots and cold spots in Haikou city are concentrated in the north, whereas in Sanya city, hotspots are mainly located in the west and southeast, with a smaller cold spot range. Overall, the quality of green spaces in both cities is relatively good, but uneven distribution of green spaces persists in some older residential areas and large commercial districts. This study identifies areas of vulnerability and strengths in green environmental spaces, offering a paradigm for promoting green city development. It represents an advancement in the study of urban green space environmental benefits, providing insights for urban ecological planning, construction, and fine-grained environmental management, thereby facilitating sustainable urban development.
关键词:urban remote sensing;urban green space;Surface temperature;environmental benefits;geographical detector;machine learning
摘要:Objective Cloud is the most concern of weather, climate, weather modification and environment. It is crucial to obtain cloud microphysical parameters and properties from satellite observation of clouds. They can help to further understand the vertical structure of clouds and the formation of cloud and precipitation, and to quantitatively assess aerosol-cloud-precipitation interactions and radiation climate effects of clouds. They also have significant application potential for identifying and warning strong convection, as well as evaluating the effects of weather modification. It is necessary to validate the accuracy of retrieved microphysical parameters such as cloud droplet effective radius (re), phase and the vertical structure of clouds.Method Based on our developed retrieval system and improved methodologies of cloud microphysical properties from satellite, 3.7 μm channel data with multi-channel combination from MODIS and AVHRR satellite is used to retrieve cloud droplet effective radius and their vertical structure, which are compared with that from 22 airborne measurements of continental cumulus clouds and with re from MODIS cloud products, the accuracy and the reliability of satellite retrieval are accessed by means of these comparisons.Result The comparisons show that the error of droplet effective radius between the retrieval and the airborne measurements is less than 2.4 μm, which is very close to 2 μm of international verification results within marine stratus. The distribution of re with temperature/height (vertical structure) is quite consistent with that detected by aircraft measurement. The retrieved refrom the 3.7 μm has a high correlation with the airborne measurement, with a correlation coefficient of 0.79 and a linear fitting slope of 0.81. However, the refrom MODIS cloud product has a low correlation with the airborne measurement, the correlation coefficient and linear fitting slope are 0.43 and 0.32, respectively. All these suggest that the high accuracy of retrieved cloud droplet effective radius and the high reliability of the retrieved methodologies. When the zenith angle of a satellite is large, it may lead to overestimation of re. In application, it is necessary to consider the influence of the zenith angle of the satellite to avoid excessive deviation of the satellite observation position and ensure retrieval accuracy.
摘要:Normalized Difference Vegetation Index (NDVI) images with fine spatial and temporal resolutions are important data for real-time precise vegetation monitoring. Remote sensing image, as an important data source for producing NDVI data, however, always present a trade-off between the spatial and temporal resolutions due to the limitation in the power of satellites. Generally, sensors with fine spatial resolution always have a long revisit time (e.g., Landsat images), while sensors with a short revisit period always have coarse spatial resolution (e.g., Moderate-resolution Imaging Spectroradiometer (MODIS) images). Spatio-temporal fusion technique can be applied to generate NDVI images with both fine spatial and temporal resolutions, by fusing NDVI images acquired from these two categories of sensors. The existing spatio-temporal fusion methods, however, suffer from a long-standing challenge, that is, the NDVI change between the images at the known and prediction times, which restricts the accuracy of spatio-temporal fusion prediction greatly. In this paper, a spatio-temporal fusion then spatial reconstruction (STFSR) method was proposed to cope with the NDVI change issue in predicting the 30 m Landsat NDVI images.Generally, when predicting the missing Landsat NDVI image by spatio-temporal fusion, a pair of spatially complete fine and coarse spatial resolution NDVI images is also required (probably temporally far from the prediction time). Except for the original auxiliary images, the proposed STFSR method also included the fine spatial resolution image temporally closer to the prediction time, but with different degrees of data loss caused by cloud cover (hereafter, simplified as auxiliary cloudy NDVI image) in prediction. The implementation of STFSR is divided into two steps: (1) Reconstructing the non-cloud area (the corresponding non-cloud area in the auxiliary cloudy image, but in the image to be predicted) using the spatial and temporal adaptive reflectance fusion model (STARFM). (2) Reconstructing the cloud area (the corresponding cloud area in the auxiliary cloudy image, but in the image to be predicted) by a spatial-temporal random forest (STRF) algorithm, a spatial reconstruction method integrating the information from both fine and coarse spatial resolution NDVI images.In the experiments in three regions, the effectiveness of the proposed STFSR method was evaluated, by comparing with two commonly used spatio-temporal fusion methods, the STARFM and the spatial weighting-based virtual image pair-based spatio-temporal fusion (VIPSTF-SW) algorithms. The results demonstrate that the proposed STFSR can produce greater accuracy than the other two methods for all three regions. Furthermore, when the cloud coverage increases to a certain percentage (e.g., 80%) in the auxiliary cloudy image, the STFSR method can still provide a more satisfactory prediction than two benchmark methods. Specifically, the average Root Mean Square Error (RMSE) of STFSR is 0.0217 and 0.0188 smaller than that of STARFM and VIPSTF-SW, respectively. The corresponding average Correlation Coefficient (CC) is 0.0820 and 0.0742 larger, and the corresponding average Relative Global-dimensional Synthesis Error (ERGAS) is 4.3170 and 3.8535 smaller.The proposed STFSR method takes full advantage of the important information in the cloudy, but temporally closer NDVI image, which fails to be utilized in existing spatio-temporal fusion methods. Generally, the proposed STFSR method provides a flexible solution to deal with the NDVI change in spatio-temporal fusion. Moreover, this model has great potential for the generation of other vegetation index data with fine spatial and temporal resolutions, such as the Enhanced Vegetation Index (EVI) and the Leaf Area Index (LAI).