最新刊期

    28 12 2024
    封面故事

      Reviews

    • 在光学遥感图像目标检测领域,专家综述了基于深度学习的算法研究进展,为解决目标检测问题提供解决方案。
      XU Danqing, WU Yiquan
      Vol. 28, Issue 12, Pages: 3045-3073(2024) DOI: 10.11834/jrs.20243166
      Progress of research on deep learning algorithms for object detection in optical remote sensing images
      摘要:Among all applications of optical remote sensing images, object detection has always been given more attention by researchers. Object detection has a wide application prospect in military and civilian fields. This study reviews the progress of research on object detection algorithms in optical remote sensing images on the basis of deep learning. The characteristics of remote sensing objects are different from those of conventional objects. First, remote sensing equipment has a long imaging distance, so it can cover a large range. The images may have objects with large scale and shape changes. Second, in remote sensing images, the background tends to occupy a large area. As a result, some objects are often submerged in the complex background, and detectors cannot distinguish these objects effectively. Last, in remote sensing images, the objects do not only have a small size and changeable direction. Sometimes, remote sensing objects are densely distributed, posing challenges to the detection of optical remote sensing objects.This study introduces the development process of optical remote sensing object detection algorithms from template matching, prior knowledge, and machine learning to deep learning. Then, the process of optical remote sensing object detection based on deep learning, including data preprocessing, feature extraction, detection, and postprocessing, is introduced in detail. Classical deep learning-based object detection algorithms, including the one-stage algorithms represented by YOLO and SSD and the two-stage algorithm represented by Faster RCNN, are summarized. Afterward, in accordance with the characteristics of optical remote sensing image objects, various improved algorithms for addressing the optical remote sensing image object detection problems of scale diversity, direction diversity, shape diversity, small size, feature similarity, background complexity, distribution density, and weak features are systematically summarized. Non-strong supervised learning-based optical remote sensing image object detection methods and other advanced algorithms, such as Transformer-based algorithms, transfer learning-based algorithms, knowledge graph-based algorithms, and prior knowledge-based algorithms, are also summarized. In addition, open-source optical remote sensing image datasets and the performance of object detection evaluation indexes are introduced.The Mean Average Precision (mAP) of advanced algorithms on the NWPU-VHR10 dataset can exceed 90%. On the DOTA dataset, the mAP of each advanced algorithm decreases considerably, and the advanced algorithms proposed in recent years continue to improve their performance on this dataset. Multiscale fusion with a feature pyramid network has become the mainstream method of advanced algorithms, which can detect multiscale objects effectively.Many improved algorithms have been proposed to solve the abovementioned problems in optical remote sensing image object detection, and good detection results have been achieved. However, the research on object detection in large-scale remote sensing images and similar object detection between classes remains lacking. Furthermore, this study proposes future development directions, such as improved deep learning networks, lightweight networks, weakly supervised learning, small-object detection, and improved rotary detection mechanisms.  
      关键词:optical remote sensing image;object detection;deep learning;object characteristics;object detection process;object detection framework;incomplete supervised learning;dataset;evaluation index   
      1904
      |
      1338
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 50038959 false
      发布时间:2025-01-17
    • 点云刚性配准技术在自动驾驶等领域广泛应用,本文综述了基于深度学习的配准方法研究进展,为该领域未来研究提供参考。
      ZHOU Ruqin, WANG Peng, DAI Chenguang, WANG Hanyun, JIANG Wanshou, ZHANG Yongsheng
      Vol. 28, Issue 12, Pages: 3074-3093(2024) DOI: 10.11834/jrs.20243396
      Status and progress of deep learning-based pairwise point cloud rigid registration
      摘要:Point cloud registration is the spatial alignment of two or more point clouds through geometric transformations. As a fundamental task in 3D point cloud data processing, point cloud registration is an important preprocessing operation for tasks, such as 3D modeling, object recognition, and scene understanding. Given the nonstructural, sparse, and uneven characteristics of point cloud data, point cloud registration remains one of the hotspots and challenges in computer vision, mapping, and remote sensing, although many studies have been conducted on it. With the emergence and rapid development of neural networks, deep learning demonstrates huge utilization potential in applications, such as point cloud classification, recognition, detection, and reconstruction. In recent years, many researchers have attempted to apply deep learning techniques to point cloud registration. Deep learning-based point cloud registration methods can automatically learn highly discriminative and robust point cloud features that contain geometric structural and semantic information, which are crucial for achieving high registration accuracy. In this study, the research on pairwise point cloud rigid registration technology based on deep learning is systematically reviewed and analyzed. First, feature extraction networks based on deep learning are introduced. Second, the progress of research on registration methods, namely, correspondence estimation, pose regression, and scene flow estimation, is reviewed, and the characteristics, advantages, and disadvantages of the three methods are summarized. Third, this study systematically summarizes and categorizes existing publicly available datasets that can be used for rigid point cloud registration. Last, the status of current research on point cloud registration is determined; the advantages and limitations of existing methods in terms of feature learning, registration accuracy, registration efficiency, and other aspects are explained; and future research directions are proposed. Specifically, three exploration directions are presented. (1) Existing deep learning-based point cloud registration methods require a large amount of annotated data, and the data annotation time and manpower costs are high. Methods, such as few-shot learning, self-supervised learning, weakly supervised learning, and unsupervised learning, can alleviate deep learning technology’s need for large amounts of labeled data to some extent. (2) The matching primitives in existing deep learning-based point cloud registration methods still focus on 3D points. Compared with geometric elements, such as 3D line segments and planes, 3D points have high ambiguity and mismatch rates. In the future, the use of deep learning techniques to automatically extract the geometric structural elements of scenes from 3D point cloud data should be considered, and the registration problem of 3D point clouds in large scenes should be solved based on these geometric structural elements. (3) Existing deep learning-based registration methods mainly utilize the spatial geometric features of scenes, with minimal consideration of semantic information. In recent years, rapid advances in deep learning-based point cloud semantic understanding techniques have been achieved. Therefore, in the future, the combination of spatial geometric structural information with semantic information to solve the registration problem of complex scenes in 3D point clouds should be considered.  
      关键词:remote sensing;three-dimensional point cloud;rigid registration;deep learning;point cloud feature;correspondences estimation;pose regression;scene flow estimation;registration datasets   
      356
      |
      399
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 61979065 false
      发布时间:2025-01-17

      Forestry and Agriculture

    • 在应对全球变暖背景下,中国人工林碳汇功能趋势预测及经营管理决策研究取得新进展。专家利用无人机激光雷达技术,建立了基于单木地上生物量估测的杉木林生长监测体系,为快速准确监测森林生长提供解决方案。
      XIONG Jingfeng, ZENG Hongda, XIE Jinsheng, LI Xiaojie, CHEN Jingming
      Vol. 28, Issue 12, Pages: 3094-3106(2024) DOI: 10.11834/jrs.20232498
      Preliminary study on dry and wet season changes in biomass on Chinese fir forest land on the basis of UAV LiDAR
      摘要:With global warming, the pattern of extreme precipitation has changed considerably. How China’s plantations, which are widely distributed and at a rapid growth stage, will respond to climate change; the prediction of their carbon sink functions; and the corresponding management decisions urgently need methods to monitor growth quickly and accurately. In this study, a 17-year-old middle-aged Chinese fir plantation was monitored three times semiannually from February 2019 to February 2020 by using UAV LiDAR. Three methods to estimate the individual tree Aboveground Biomass (AGB) via UAV LiDAR parameters were compared using ground survey data. The three methods were height and crown diameter regression, diameter at breast height-tree height regression, and diameter at breast height-tree height-crown diameter regression (D-HCD). Then, the seasonal growth changes in tree height, diameter at breast height, and AGB were estimated in a six-month interval. Results showed that D-HCD was the optimal method for estimating the individual AGB of Chinese fir (R2=0.77, root mean square error [RMSE]=15.99 kg). In the D-HCD method, the tree height and crown diameter extracted by UAV LiDAR were used to estimate the diameter at breast height (DBH), and AGB was calculated by substituting DBH and tree height into an allometric equation. The annual and even seasonal growth changes in the Chinese fir plantation in the fast-growing period could be accurately monitored by UAV LiDAR. The average total accuracy of individual tree identification in 16 plots reached 0.927, the estimated RMSE of tree height was only 0.13 m, the R2 between the estimated and measured annual individual AGB changes (ΔAGB) was 0.64, RMSE was equal to 1.87 kg, and the relative error (rRMSE) was 29.74%. When the individual trees were upscaled to plots, the relative error of ΔAGB estimation was reduced, and rRMSE decreased to 17.10%. During the study period, the average daily temperature in spring and summer was 7 ℃ higher than that in autumn and winter, and the amount of rainfall was more than three times that in autumn and winter. The seasonal distribution of rainfall in this year was seriously uneven, and the growth of Chinese fir showed obvious differences in dry and wet seasons. The average increments in individual tree height in wet and dry seasons were 0.50 and 0.13 m, respectively, and the biomass increments were 5.12 and 1.37 kg, respectively. Individual ΔAGB increased with DBH class during annual and seasonal growth; the larger the individual was, the more advantageous the growth was. In particular, the growth of dominant trees in the dry season was much better than that of other diameter classes, indicating a strong drought tolerance. However, the growth of individuals whose DBH was much smaller than the average level almost stopped.  
      关键词:remote sensing;Chinese fir;aboveground biomass;UAV-LiDAR;Multitemporal;Dry and wet season   
      850
      |
      633
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 35995467 false
      发布时间:2025-01-17
    • 在森林可燃物参数估算领域,专家利用无人机技术建立了CBD和CBH估算模型,为林火行为预测提供精细数据。
      SUN Hao, GUO Xiaoyi, ZHANG Hongyan, ZHAO Jianjun
      Vol. 28, Issue 12, Pages: 3107-3122(2024) DOI: 10.11834/jrs.20233094
      Estimating canopy bulk density and canopy base height using UAV LiDAR and multispectral images
      摘要:Wildfire behavior modeling programs require spatial layers of Canopy Bulk Density (CBD) and Canopy Base Height (CBH) to predict fire spread. However, the two canopy fuel metrics have been investigated by only a few studies in China. Inaccurate spatial estimates may result from the utilization of traditional field-based estimates, which assume averages across spatial extents. Recently, unmanned aerial vehicles (UAVs) have emerged as valuable tools that provide LiDAR point clouds and multispectral images for estimating CBD and CBH at fine resolution. The main objective of this study is to develop an area-based approach to estimate CBD and CBH and evaluate the accuracy of various UAV datasets at 10 m resolution at the local scale in China. A case study area is set up in Jiaohe City, Jilin Province, which is predominantly covered by coniferous forests in low mountains and hills. Field data, species, crown base height, total tree height, and diameter at breast height are obtained from 106 circular plots and served as modeling and validation datasets. The Fire and Fuels Extension of Forest Vegetation Simulator is used to calculate CBD and CBH for each plot. Best subset regression and random forest models are employed to establish relationships between the 106 field data points collected and the predictive variables derived from UAV LiDAR and multispectral imagery. Given the nonlinearity of the data, the Box–Cox procedure is utilized and shows that 0.5 power transformation is appropriate for best subset regression. The R2 value of CBD is always lower than that of CBH when the same models and input dataset are used. The fusion of LiDAR with multispectral imagery produces the best accurate estimation of CBD when random forest is employed (R2 = 0.5142, root mean squared error [RMSE] = 0.0773 kg/m3, relative RMSE [rRMSE] = 40.73%). LiDAR achieves the most accurate estimation for CBH (R2 = 0.6477, RMSE = 1.6245 m, rRMSE = 31.17%). For the best subset regression and random forest models, the use of LiDAR point clouds alone has higher accuracy in estimating CBD and CBH compared with the use of multispectral imagery. The best subset regression models have R2 values that are greater than those of the random forest models for multispectral imagery alone. This finding indicates that the CBD and CBH values estimated using multispectral imagery are higher than those estimated using LiDAR at a margin of the study area because of crop land. For the various models, fusing LiDAR with multispectral imagery does not necessarily improve estimation accuracy compared with using LiDAR and multispectral imagery alone. Therefore, we recommend using the random forest model that fuses LiDAR and multispectral imagery and LiDAR alone to map CBD and CBH in the study area, respectively, because they have the lowest RMSE. The best subset regression model involves 3 to 6 variables, and the random forest models have 10 to 52 predictive variables. Among the original LiDAR predictor variables, height features are the most important, and structure features have considerable importance. The selected multispectral imagery features of both models exhibit diversity in various canopy flue metrics. This study provides clear evidence that UAV LiDAR and multispectral imagery can be used to derive fine-resolution CBD and CBH, which are crucial for fire behavior modeling at the landscape scale and for forest management activities and decision-making.  
      关键词:remote sensing;LiDAR;multispectral images;canopy bulk density;canopy base height;the best subset regression;random forest   
      312
      |
      942
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 43050172 false
      发布时间:2025-01-17
    • 最新研究进展显示,通过遗传算法优化的CBA-Wheat模型在冬小麦生物量估算领域具有高反演精度和应用潜力。
      WANG Shijun, LIU Miao, ZHAO Yu, LIU Zhaoyu, LIU Xiuyu, FENG Haikuan, SUI Xueyan, LI Zhenhai
      Vol. 28, Issue 12, Pages: 3123-3135(2024) DOI: 10.11834/jrs.20243080
      Estimating winter wheat biomass by coupling the CBA-Wheat model and multispectral remote sensing
      摘要:Biomass is an important indicator that reflects the growth status of crops. Timely and accurate estimation of aboveground biomass of winter wheat is crucial for yield prediction and field management decision-making. The crop biomass model (CBA-Wheat) developed using the remote sensing spectral index (VI) and digital Growth Stage (ZS) is suitable for the estimation of winter wheat biomass in the whole growth period. The first layer of this model is a linear model of AGB and VI, and it constructs linear regression models between AGB and VI in different growth periods. The AGB model coefficients of each period have a good evolutionary law with ZS. However, the model parameters are based on ground hyperspectral data in a previous study, and the satellite data need extensive ground-measured data to calibrate the model parameters, thus limiting popularization at the regional scale. In this study, the Genetic Algorithm (GA) is used to optimize the parameters of CBA-Wheat model globally. GA combines the survival rules of the fittest in biological evolution with the random information exchange system of the chromosome within the population and has an efficient global optimization effect on some nonlinear, multimodel, multiobjective function optimization problems. The two input variables of CBA-Wheat are field-recorded ZS data and VI from high-resolution remote sensing images. Four VI-based CBA-Wheat models, namely, enhanced vegetation index 2 (CBA-WheatEVI2), difference vegetation index (CBA-WheatDVI), ratio vegetation index (CBA-WheatRVI), and modified simple ratio vegetation index (CBA-WheatMSR), are constructed. The best model is used for AGB mapping. Meanwhile, partial least squares regression (PLSR) is adopted to compare the accuarcy of CBA-Wheat models. Results showed that all four CBA-Wheat models have good accuracy, and the simulated winter wheat biomass is consistent with the measured biomass. Among the models, CBA-WheatEVI2 has the highest determination coefficient (R2) and root mean square error (RMSE) of 0.92 and 1.37 t/ha, respectively. Compared with the machine learning method, the accuracy of biomass estimation based on CBA-Wheat model is better than that of biomass estimation based on the PLSR method (R2=0.85, RMSE=1.87 t/hm2). The CBA-Wheat model has good biomass estimation performance at the various growth stages of winter wheat and performs well in high-biomass situations without considerable underestimation. The CBA-Wheat model optimized by GA in this study has high inversion accuracy and is suitable for the inversion of winter wheat at multiple growth stages, and has good application potential in using satellite remote sensing data to predict large-area biomass.  
      关键词:winter wheat;biomass;genetic algorithm;CBA-Wheat model;multi-source data;EVI2;Sentinel-2;remote sensing   
      496
      |
      344
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 50069752 false
      发布时间:2025-01-17
    • 在遥感识别领域,专家提出了JAM-R-CNN深度网络模型,有效提升了梯田识别精度,具有应用价值。
      XIE Junyang, LIN Anqi, WU Hao, WU Ziwei, WU Wenbin, YU Qiangyi
      Vol. 28, Issue 12, Pages: 3136-3146(2024) DOI: 10.11834/jrs.20233126
      JAM-R-CNN deep learning network model for remote sensing recognition of terraced fields
      摘要:Efficiently and accurately determining the spatial distribution of terraced fields provides important data support for soil and water conservation and improves the regulatory level of agriculture in mountainous areas. When deep learning methods are used for terrace recognition, narrow and elongated terraces are prone to be missed because of convolution operations, and in complex backgrounds with unclear terrain boundaries in mountainous areas, large areas of adhesive recognition results are easily generated, leading to low accuracy in the final terrace recognition. Prior to the achievement of accurate recognition of terrace information, the urgent technical problems to be solved are how to effectively maintain the high semantic information of high-resolution remote sensing images in the convolution operation process on the basis of the characteristics of terraces and how to reduce the omission of narrow and long terraces and the adhesion of recognition results. To address these problems, this study proposes the JAM-R-CNN deep learning network terrace recognition method that adopts remote sensing images with very high resolution. This network is based on the Mask Region-based Convolutional Neural Network (Mask R-CNN) model. It integrates the jumping network to maintain the high semantic information of high-resolution remote sensing images, employs the convolutional block attention module to enhance the feature expression ability of terraces, and modifies the anchor size to adapt to the narrow and long characteristics of terraces and improve terrace recognition accuracy. A part of the salt well terraces in Nanchuan District, Chongqing, China, is selected as the study area to test the proposed method, and four models in domestic GF-2 satellite image data are used for experiments. Results show that the terrace parcel map derived from the JAM-R-CNN model has a precision of 90.81%, recall of 84.28%, F1 score of 88.98%, and Intersection over Union (IoU) value of 83.15%. Compared with Mask R-CNN, JAM-R-CNN’s precision, recall, F1 score, and IoU value are increased by 1.96%, 5.26%, 3.29%, and 5.19%, respectively, indicating that the JAM-R-CNN model can better identify the terraces than Mask R-CNN can. Most of the terraces identified by Unet and DeepLab v3+ are connected together, and the terraced fields with a small size are not distinguished. The JAM-R-CNN model identifies fewer missing areas on the periphery of terraces compared with Mask R-CNN model, and the number of missing narrow and long terraces is considerably reduced. This result is the effect of three improved parts and further proves that the proposed JAM-R-CNN model exerts a remarkable improvement effect and demonstrates superior performance in remote sensing recognition of terraces. The proposed JAM-R-CNN deep learning network model effectively reduces the adhesion phenomenon of terrace recognition results and considerably improves the extraction rate for narrow and long terraces, thus achieving a substantial improvement in the overall accuracy of terrace remote sensing recognition. The model has good application value.  
      关键词:remote sensing;terrace recognition;high resolution remote sensing image;deep learning;jump network;JAM-R-CNN   
      391
      |
      378
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 40073061 false
      发布时间:2025-01-17
    • 最新研究利用多尺度稠密扩张卷积神经网络,有效识别东北黑土区侵蚀沟,为土地综合治理提供精确数据。
      FENG Quanlong, JIANG Zihang, NIU Bowen, GAO Bingbo, YANG Jianyu, YANG Ke
      Vol. 28, Issue 12, Pages: 3147-3157(2024) DOI: 10.11834/jrs.20243139
      Multiscale feature extraction model for remote sensing identification of erosion gullies in Northeast China’s black soil region: A case study of Hailun City
      摘要:Black soil is a valuable and essential soil resource, particularly in the northeastern region of China where it serves as the primary grain-producing area. However, the quality of local agriculture is considerably affected by soil erosion, with erosion gullies representing a prominent manifestation of this issue. Erosion gullies, which are formed because of soil erosion, often interconnect within a hydrological network, creating a tree-like distribution of erosion gully systems that inflict severe damage on cultivated land. Therefore, accurate identification and detection of erosion gullies are pivotal for safeguarding arable land.This study explores the feasibility of utilizing remote sensing imagery for erosion gully detection and identification, taking advantage of its vast coverage and multiple capture instances. We introduce a novel deep learning model based on a multiscale dense dilated convolutional neural network tailored for erosion gully recognition. Our model incorporates dense connections of multiscale dilated convolutional residual modules and is optimized to aggregate the multilevel spatial features of erosion gullies.The research is conducted in Hailun City, Heilongjiang Province, which serves as the study area. Our approach involves cropping remote sensing images into predefined patches, which are then annotated to construct training datasets comprising two categories: erosion gullies and non-gullies. Subsequently, the model is trained on the training dataset and evaluated on the test dataset, with weight selection being based on the highest test dataset accuracy. Utilizing the selected weights, we perform sliding window identification across the entire Hailun City area, thereby generating spatial distribution data for erosion gullies. Furthermore, we realize erosion gully area localization on the basis of scene-level labels and class activation maps to offer guidance for boundary extraction.The findings demonstrate the efficacy of the proposed model, which achieves an impressive overall accuracy of 95.80% and a kappa coefficient of 0.9152. It outperforms traditional deep learning models, such as GoogLeNet, ResNet, DenseNet, and Swin-Transformer. Notably, the overall accuracy in the sliding window recognition phase decreases slightly compared with that in the test phase because of the increased complexity of remote sensing imagery in practical applications. To address this challenge, we recommend using a fusion of remote satellite images and street view imagery in future research to enhance recognition capabilities in complex scenarios.This study underscores the effectiveness of erosion gully identification through the application of a multiscale dense dilated convolutional neural network. It serves provides precise spatial distribution data concerning erosion gullies, thereby contributing to integrated land management in the black soil region of Northeast China.  
      关键词:erosion gullies;Black soil region in Northeast China;deep learning;scene recognition;feature extraction;Dilated convolutional neural network;remote sensing monitoring;Cropland protection   
      546
      |
      355
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 50143219 false
      发布时间:2025-01-17

      Models and Methods

    • 最新研究提出基于自适应SLIC的遥感影像去雾算法,有效提升影像清晰度和细节信息恢复。
      YU Hang, LI Chenyang, LIU Zhiheng, ZHOU Suiping, GUO Yuru
      Vol. 28, Issue 12, Pages: 3158-3172(2024) DOI: 10.11834/jrs.20242532
      Remote sensing image dehazing algorithm based on adaptive SLIC
      摘要:Objective Remote sensing images have degraded clarity because of haze, which makes remote sensing image target detection, feature segmentation, and remote sensing image information interpretation difficult. Remote sensing image defogging based on deep learning is time consuming because of the large number of model parameters and the dependence on the amount of remote sensing image data. Remote sensing image dehazing based on image enhancement does not fully consider the degradation mechanism of remote sensing images in hazy conditions, and as a result, remote sensing images cannot be used for different scenes and easily lose their image information, leading to image distortion. Remote sensing image dehazing based on physical models requires manual parameter setting during transmittance refinement. At the same time, because the contrast of remote sensing images is not completely enhanced, the overall color of dehazed images is dark, and fog remains in local areas.Method In this study, a remote sensing image dehazing method based on image enhancement and physical modeling is proposed to solve the abovementioned problems and improve the quality of remote sensing image dehazing. An adaptive Simple Linear Iterative Clustering (SLIC)-based remote sensing image dehazing algorithm is proposed. First, for the problem of local area highlighting in hazy remote sensing images and the atmospheric intensity value calculation bias problem, an improved Retinex algorithm is used to contrast-enhance the input remote sensing images. The objective is to preserve image details, reduce artifacts, extend the dynamic range of image contrast, and accurately estimate the atmospheric intensity value of remote sensing images. Second, an adaptive SLIC algorithm is proposed to solve the difficulty of setting the number of superpixels and performing superpixel segmentation on the input remote sensing image to avoid the influence of the local contrast intensity region on the fixed window and obtain an accurate transmittance estimation. Last, a haze-free remote sensing image is recovered based on the dark channel a priori principle and atmospheric scattering model. The proposed method can achieve adaptive dehazing of remote sensing images without manual parameter setting.Results The proposed algorithm is compared with the four algorithms of He et al., Zhu et al., Han et al., and Nie et al., and the dehazing effects are compared using the publicly available datasets Inria Aerial Image Dataset and RICE Image Dataset. Subjectively, the remote sensing images processed by the proposed algorithm have a more realistic color, more complete dehazing, clearer features, and better retention of image detail information compared with the images processed by the other algorithms. Objectively, the mean value of image information entropy, the peak signal-to-noise ratio, and the structural similarity of the proposed algorithm are 7.56, 22.05, and 0.87, respectively, which are higher than the values for the four other algorithms.Conclusion The proposed dehazing algorithm model integrates the advantages of image enhancement and recovery, thus making the dehazed remote sensing images natural and realistic. It also effectively recovers remote sensing image detail information.  
      关键词:remote sensing image dehazing;Adaptive SLIC;Dark channel a priori;Retinex;superpixel segmentation   
      252
      |
      483
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 50040678 false
      发布时间:2025-01-17
    • 在遥感影像建筑区域提取领域,专家基于Transformer结构,提出了一种端到端的提取方法,有效提高了建筑区域检出率和提取精度,为城市规划等提供解决方案。
      LIU Yi, ZHANG Yinjie, AO Yang, JIANG Dalong, ZHANG Zhaorui
      Vol. 28, Issue 12, Pages: 3173-3183(2024) DOI: 10.11834/jrs.20233017
      Building extraction from remote sensing images via the multiscale information fusion method under the Transformer architecture
      摘要:As deep learning develops, researchers are paying increasing attention to its application in remote sensing building extraction. Many experiments on multiscale feature fusion, which boosts performance during the feature inference stage, and multiscale output fusion have been conducted to achieve a trade-off between accuracy and efficiency and obtain enhanced details and overall effects. However, current multiscale feature fusion methods consider only the nearest feature, which is insufficient for cross-scale feature fusion. The functions of multiscale output fusion are also limited in a unary correlation, which only considers the scale element. To address these problems, we propose a feature fusion method and a result fusion module to improve the accuracy of building extraction from remote sensing images.This study proposes the Triple-Feature Pyramid Network (Tri-FPN) and Class-Scale Attention Module (CSA-Module) based on Segformer to extract buildings in remote sensing images. The whole network structure is divided into three components: feature extraction, feature fusion, and classification head.In the feature extraction component, the Segformer structure is adopted to extract multiscale features. Segformer utilizes the self-attention function to extract feature maps of different scales. To adaptively enlarge the receptive fields, Segformer uses a strided convolution kernel to shrink the key and value vector in the self-attention computation process. The calculation cost decreases considerably.The goal of the feature fusion component is to fuse multiscale features from different parts of the feature extraction network. Tri-FPN consists of three feature pyramid networks. The fusion follows the sequence top-down, bottom-up, and top-down, thus enlarging the scale-receptive field. The basic fusion blocks are 3×3 convolution with feature element-wise addition and 1×1 convolution with channel concatenation. This design helps maintain the spatial diversity and inner-class feature consistency.In the classification head component, each pixel is assigned a predicted label. First, the feature map goes through 1×1 convolution to obtain a coarse result. Second, the feature map shrinks in the channel dimension via 1×1 convolution. Third, the shrunk feature map is concatenated with the coarse result and up-sampled two times. Fourth, the mixed feature is segmented by 5×5 convolution. A height×width×class attention map, which considers class information, scale diversity, and spatial details, is calculated by a 3×3 convolution block on the mixed feature at the same time. Last, the coarse and mixed-feature results are fused in the attention map.A series of experiments is conducted on WHU Building and INRIA datasets. For the WHU Building dataset, the precision reaches 95.42%, the recall is 96.25%, and the Intersection Over Union (IOU) value is 91.53%. For the INRIA dataset, the precision, recall, and IOU value reach 89.33%, 91.10%, and 81.7%, respectively. The increments in recall and IOU exceed 1% relative to the backbone. These results prove that the proposed method has strong feature fusion and segmentation abilities.Tri-FPN effectively improves building extraction accuracy and overall efficiency, especially on the boundaries and holes in the building area, thus verifying the validity of multiscale feature fusion. By considering class, scale, and spatial attention, the CSA-Module can considerably improve accuracy with negligible parameters. The structure demonstrates an improved ability to predict small buildings and details in remote sensing images by adopting Tri-FPN and CSA-Module.  
      关键词:remote sensing images;building extraction;deep learning;Transformer;image feature pyramid;class-scale attention   
      677
      |
      456
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 39019610 false
      发布时间:2025-01-17
    • 最新研究利用FVC-Net深度学习技术,将30米Landsat FVC数据降尺度至10米,为全球生态系统研究提供精细数据支撑。
      ZHANG Zhihao, WANG Qunming, DING Xinyu
      Vol. 28, Issue 12, Pages: 3184-3196(2024) DOI: 10.11834/jrs.20243112
      FVC-Net: A fusion network for producing fine spatial resolution fractional vegetation cover
      摘要:Fractional Vegetation Cover (FVC) is defined as the percentage of the vertical projected area of vegetation to the total area of the projected subsurface. It is an important indicator to characterize the spatial distribution of vegetation on the land surface, which plays an essential role in quantifying the capacity of terrestrial ecosystems for carbon sequestration. Remote sensing satellites (such as Landsat and Sentinel-2) can acquire fine spatial resolution FVC data at the 10 m level, which are crucial sources for studies on the global ecosystem. However, a large amount of fine spatial resolution FVC data are unavailable in the temporal domain due to the relatively coarse temporal resolution of satellites, coupled with cloud contamination. This study considers the combination of 30 m Landsat 8 and 10 m Sentinel-2 to increase the temporal frequency of 10 m FVC data to obtain vegetation information timely.A deep learning-based method called FVC-Net was proposed to address the difference in spatial resolution. FVC-Net fuses 30 m Landsat FVC with the 10 m Sentinel-2 Normalized Difference Vegetation Index (NDVI) directly to produce 10 m Landsat FVC. Specifically, a two-branch network based on the multiscale attention mechanism is designed. In this network, channel enhancement blocks are used in FVC and NDVI branches for feature extraction and fusion. Then, the spatial attention blocks are employed to increase the spatial details of the fused FVC features. The scheme designed with FVC-Net can help to characterize the nonlinear relationship between 10 m NDVI and 30 m FVC.In the experiments, the proposed FVC-Net method was validated based on three regions selected in the urban area in Shanghai, China. FVC-Net was compared with four typical non-deep learning-based (i.e., HPF, Indusion, SFIM and ATPRK) and four deep learning-based (i.e., PanNet, PNN, HPGAN and NDVI-Net) fusion methods. Both visual and quantitative evaluation reveals that: 1) in non-deep learning-based methods, ATPRK is more accurate than the other three methods; 2) the results of the deep learning-based methods are closer to the reference FVC data; 3) the proposed FVC-Net method outperforms the eight benchmark methods in terms of the fusion accuracy, with results presenting the smallest errors. Finally, the experiments on fusion of real 30 m Landsat FVC and Sentinel-2 10 m NDVI show that the 10 m FVC produced by FVC-Net presents more spatial detailed texture than original 30 m input Landsat FVC.The proposed FVC-Net is an effective solution to downscale 30 m Landsat FVC to 10 m by fusion with 10 m Sentinel-2 NDVI, thus effectively overcoming the differences between Sentinel-2 and Landsat data at different time points. The generated 10 m FVC data by FVC-Net not only enhance the ability of Landsat data to express spatial details, but also serve as an additional supplement resource to increase the temporal resolution of the existing 10 m Sentinel-2 observations. In future, FVC-Net can be potentially applied to downscale existing 30 m Landsat FVC products at a larger scale (e.g., the global scale), and the predictions may support studies in various related fields.  
      关键词:remote sensing;Fractional Vegetation Cover (FVC);normalized difference vegetation index (NDVI);deep learning;downscaling;data fusion   
      517
      |
      421
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 50119651 false
      发布时间:2025-01-17
    • 在遥感影像处理领域,专家提出了基于自适应非局部模式一致性的多模态遥感影像变化检测方法,有效提升了环境监测与灾害现象识别的准确性。
      HAN Te, TANG Yuqi, CHEN Yuzeng, ZHANG Fangyan, YANG Xin, ZOU Bin, FENG Huihui
      Vol. 28, Issue 12, Pages: 3197-3212(2024) DOI: 10.11834/jrs.20233072
      Change detection method based on adaptive nonlocal pattern consistency for multimodal remote sensing images
      摘要:Multimodal remote sensing image change detection is an active research area in the field of remote sensing image processing and plays a crucial role in disaster monitoring, urban planning, natural resource monitoring, and other domains. To address the problem of insufficient utilization of target spatial structure features and image change information in existing methods, this study proposes that the spatial structure features of unchanged regions in multimodal images are consistent, whereas the spatial structure features of changed regions differ. Therefore, change information can be extracted by measuring the difference in the spatial structure of multimodal images. This study proposes a change detection method based on Adaptive Nonlocal Pattern Consistency (ANLPC) for multimodal remote sensing images.In this study, the basic processing unit for the images is made up of patches that overlap with one another. The target patch is defined as the construction pattern’s reference patch, and the other patches are viewed as homogeneous. The nonlocal mode of the image is constructed adaptively via the homogeneous patch automated selection approach and by using the rank coordinate space of the target patch as the search space to consider the spatial information of the image and narrow the search area. Cross mapping of two temporal image patterns (forward and backward mapping) is achieved by adaptive nonlocal pattern mapping to precisely assess the variation between multimodal images. With forward mapping as an example, ANLPC maps the nonlocal pattern of the first temporal image into the second temporal image domain, and the difference information of the pattern in the second temporal image domain represents the change information of the multimodal image. Similarly, backward change information is acquired from backward mapping. The final difference map is produced by combining the forward and backward difference information via curvelet transform, and the binary change detection results are produced using threshold segmentation.Four multimodal remote sensing image datasets (two optical Synthetic Aperture Radar [SAR] datasets and two optical LiDAR datasets) and two single-modal remote sensing image datasets (one optical image dataset and one SAR image dataset) are used to verify the effectiveness of the proposed method. The average improvement of the kappa coefficient of the proposed method in the six datasets is 17.28% compared with the kappa coefficients of existing methods.To address the problem of insufficient utilization of target spatial structure features and image change information in existing methods, this study uses an adaptive nonlocal pattern to characterize the structural information of the image. The changed regions are measured in the same image domain by cross mapping the nonlocal pattern to circumvent the imaging differences of multimodal images. Meanwhile, difference image fusion and threshold segmentation are employed to obtain a robust change map. The proposed method has higher accuracy than the compared methods on single-modal and multimodal datasets, a result that demonstrates the proposed method’s effectiveness and robustness.  
      关键词:remote sensing;Multimodal images;change detection;structural features;adaptive non-local patterns;pattern mapping;image fusion   
      297
      |
      469
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 46404679 false
      发布时间:2025-01-17
    • 在遥感旋转目标检测领域,研究者提出了一种仅使用图像级别标注的旋转目标检测模型,通过一致性约束提高检测性能,为解决标注困难和提高遥感目标检测性能提供新方案。
      FANG Tingting, LIU Bin, CHEN Chunhui, LI Xiangyun
      Vol. 28, Issue 12, Pages: 3213-3230(2024) DOI: 10.11834/jrs.20243138
      View-consistency network for weakly supervised oriented object detection in remote sensing images
      摘要:The objective of this study is to propose a novel oriented object detection model that can effectively detect objects in remote sensing images, while alleviating the challenges of labor-intensive and time-consuming annotation processes. Specifically, the aim is to develop an efficient and effective approach that only requires image-level annotations, which can improve the availability and diversity of remote sensing rotation object detection datasets. The proposed model is designed to overcome the limitations of traditional annotation methods that rely on bounding box annotations, which can be subjective, inconsistent, and time-consuming to create. By leveraging image-level annotations, the proposed model can greatly reduce the annotation effort, accelerate the annotation process, and enhance the scalability and applicability of remote sensing object detection in various scenarios and domains.The proposed method introduces a novel weakly supervised oriented object detection paradigm for remote sensing scenes. The model is trained in a progressive manner, starting with coarse image-level annotations and gradually refining the detection results. This approach allows the model to learn from limited annotations and adapt to the complexities of remote sensing data, which often exhibit large scale, diverse appearance, and significant rotation variations. The model incorporates consistency constraints of image-level annotation, the oriented bounding box position and the cluster center distribution in different rotation views, to enhance the accuracy and robustness of the detection performance. The oriented bounding box position consistency ensures that the predicted rotation angles of the bounding boxes are consistent across different views of the same object, while the distribution consistency of clustering centers is measured using the Hungarian loss, which ensures that the predicted object centroids are consistent across different views. These consistency constraints are designed to improve the accuracy of the model in handling rotation variations, which is a common challenge in remote sensing object detection.The proposed model is evaluated on two mainstream remote sensing object detection datasets, DIOR and DOTA-v1.0. The experimental results demonstrate that the performance of the proposed model is significantly improved compared to state-of-the-art weakly supervised remote sensing object detection models. Results show that our performance is significantly improved compared to the state-of-the-art object detectors with less strict weakly-supervised settings in remote sensing images, which also highlight the effectiveness and potential of the proposed image-level annotation-based oriented object detection model in addressing the challenges of remote sensing object detection. Furthermore, the model’s ability to generalize well to different datasets showcases its robustness and versatility.In conclusion, the proposed image-level annotation-based oriented object detection model is an innovative approach that addresses the challenges of labor-intensive and time-consuming annotation processes in remote sensing object detection. By leveraging image-level annotations and incorporating consistency constraints, the proposed model achieves improved detection performance while reducing the annotation effort and improving annotation efficiency. The experimental results on DIOR and DOTA-v1.0 datasets demonstrate the superior performance of the proposed model compared to state-of-the-art models, highlighting its potential for practical applications in remote sensing object detection and related fields. Future research can further explore the potential of image-level annotation-based oriented object detection models by incorporating more advanced techniques, exploring different data sources, and investigating real-world applications in remote sensing, geospatial analysis, and other related fields.  
      关键词:remote sensing;Oriented Object Detection;object detection;Weakly Supervised;deep learning;Rotation Consistency;Image-level Label;Hungarian Loss;DOTA Dataset   
      182
      |
      512
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 50079589 false
      发布时间:2025-01-17
    • 最新研究突破遥感影像小目标检测难题,提出双重注意力机制和双向特征金字塔算法,显著提升检测精度和效率。
      LI Kewen, ZHU Guanglei, WANG Hui, ZHU Rui, DI Xiyao, ZHANG Tianjian, XUE Zhaohui
      Vol. 28, Issue 12, Pages: 3231-3248(2024) DOI: 10.11834/jrs.20243043
      Joint dual attention mechanism and bidirectional feature pyramid for remote sensing small targets detection
      摘要:Given imaging characteristics and limitations in spatial resolution, feature extraction for small targets is difficult, which increases the hardship of small-target detection. Existing deep learning target detection network architectures are mostly based on natural images and insufficient for the research on and exploration of small targets in remote sensing images. To overcome these issues, this study proposes a remote sensing image small-target detection algorithm that combines a dual attention mechanism and a bidirectional feature pyramid.This work provides innovative contributions. (1) To solve the problems of small-target occupation in remote sensing images and the usually huge parameter size, we introduce the LKG bottleneck and generalized intersection over union loss function into YOLOv3 and propose the LKGNet-YOLO network. (2) To resolve the noise disturbance issue and the drawback of feature fusion, we introduce the DA-LKGNet bottleneck and bidirectional feature pyramid network into LKGNet-YOLO and propose the DA-LKGNet-YOLO network.The remote sensing image dataset (UA) released by the University of Chinese Academy of Sciences in 2014 and the AI-TOD dataset released by Wuhan University in 2021 is used for experiments to validate the effectiveness of the proposed method in remote sensing image small-target detection. Experimental results demonstrate that the proposed method achieves a mean average precision (mAP) of 96.21% at a threshold of 0.5 on the UA dataset. On the AI-TOD dataset, the mAP0.5—0.95 is 9.51% at thresholds ranging from 0.5 to 0.95. Compared with YOLOv3, RFBNet, SSD, FSSD, RetinaNet, and RefineDet, the proposed method has an mAP accuracy that is 3.45%—7.52% and 1.36%—4.84% higher, indicating a considerable performance improvement. Meanwhile, its detection level on small-scale targets is better than that of Faster-RCNN and YOLOv7 algorithms. Compared with the original YOLOv3, our method reduces the number of floating-point operations per second by 48% and the number of parameters by 42%.A small-target detection method for remote sensing images is proposed in this study. Its contribution lies in the utilization of the YOLOv3 model with LKGNet as the backbone network, along with the design of a dual attention mechanism and a bidirectional feature pyramid network, resulting in a lightweight DA-LKGNet-YOLO model. This model can effectively extract small-sized objects in complex remote sensing images. The experimental results confirm the effectiveness of the proposed method.  
      关键词:remote sensing image;Small Target Detection;deep learning;YOLOv3;Attention Mechanism;feature pyramid   
      255
      |
      284
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 56080759 false
      发布时间:2025-01-17
    • 在遥感影像语义分割领域,专家提出了一种边缘感知强化的高分辨率遥感影像语义分割方法,有效提升了分割准确性和连续性。
      YU Chunyan, LI Donglin, SONG Meiping, YU Haoyang, Chein-I Chang
      Vol. 28, Issue 12, Pages: 3249-3260(2024) DOI: 10.11834/jrs.20233098
      Edge-perception enhanced segmentation method for high-resolution remote sensing image
      摘要:The semantic segmentation method for high-resolution remote sensing images that is based on Deep Convolutional Neural Network (DCNN) has achieved remarkable progress, but problems still exist in the extraction and expression of the edge features of segmented objects. As a result, the edge segmentation effect of occluded and small target objects is unsatisfactory, which affects the overall accuracy of the semantic segmentation method. To solve these problems, this study proposes an edge-aware enhanced semantic segmentation method for high-resolution remote sensing images.First, we utilize the Transformer-DCNN collaborative feature extraction mechanism to extract the global self-attention features and spatial context information of remote sensing images. In this way, the proposed model makes full use of the advantages of Transformer and DCNN to extract global context information and spatial local context information, respectively. The proposed model extracts a highly accurate ground object semantic feature expression and designs a simple but effective feature extraction fusion module to fuse the features extracted by DCNN and Transformer. Second, we construct an edge-aware enhancement module composed of an edge-enhanced decoder and an uncertain point-enhanced decoder. This module enhances the edge information processing ability of the remote sensing image semantic segmentation model from two perspectives, namely, uncertain points and entity edges. Last, the semantic segmentation decoder effectively employs the feature codes containing edge information to improve the accuracy and completeness of segmented object edge prediction, which guarantees that the semantic segmentation effect of remote sensing images is improved overall.Comparative experiments are conducted on two public datasets, namely, Potsdam and Vaihingen. In comparison with the classical Unet++ network, the proposed method demonstrates improvements of 3.44% and 4.01% in mean intersection over union for the two datasets. The average F1 score and overall accuracy exhibit varying degrees of improvement. Furthermore, compared with the Transformer-based TransUNet model, the proposed method achieves better results.Enhancing the feature extraction of edge information in remote sensing objects leads to remarkable improvements in the edge and overall semantic segmentation accuracies of high-resolution remote sensing images. The proposed edge perception enhancement module improves the model’s ability to process edge information from two perspectives, namely, uncertain points and entity edges, thus effectively enhancing the edge segmentation accuracy for complex terrain objects. The results of commonly used evaluation indicators demonstrate the effectiveness and robustness of the developed model.  
      关键词:remote sensing image;semantic segmentation;Edge perception;feature extraction;Encoder;Decoder   
      573
      |
      483
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 44018249 false
      发布时间:2025-01-17
    • 在气象领域,研究人员基于Scheck(2021)神经网络观测算子,优化构建了风云四号A星可见光反射率资料同化的快速观测算子NNFO,计算效率显著提升,具有同化应用潜力。
      HE Mingfeng, ZHOU Yongbo
      Vol. 28, Issue 12, Pages: 3261-3270(2024) DOI: 10.11834/jrs.20243348
      A neural network-based forward operator for assimilating the FY-4A visible reflectance
      摘要:Satellite all-sky visible reflectance contains critical information on cloud and precipitation. Data Assimilation (DA) of these information has great potential to improve the forecasting skills of numerical weather prediction models. Conventional forward operators for DA of visible reflectance employ numerical methods to simulate radiative transferring processes and suffer from a high computational burden. Therefore, conventional forward operators cannot meet the needs of operational DA.The study constructs a fast, accurate forward operator for DA of visible reflectance data provided by the Advanced Geostationary Radiation Imager (AGRI) onboard the Fengyun-4A satellite. The forward operator is comparable with conventional forward operators in terms of accuracy and outperforms the latter in terms of computational efficiency.A feed-forward neural network is utilized to construct the forward operator. The input parameters of the Neural Network-based Forward Operator (NNFO) include the cloud water path converted into logarithmic space, mixing ratio of ice cloud water path and total cloud water path, effective radius of cloud liquid droplets, underlying surface albedo, solar zenith angle, satellite zenith angle, and relative azimuth angle between the sun and the satellite. Top-of-atmosphere reflectance is the output of NNFO. A series of sensitivity studies is performed to determine the optimal (or suboptimal) neural network settings, which include 5 hidden layers, 57 nodes in each hidden layer, the Swish activation function for the hidden layers, and batch size of 512. In addition, the neural network is trained with an adaptive learning rate depending on the training epoch and the loss for the validation dataset, which is defined by the Root Mean Square Error (RMSE).NNFO is compared with RTTOV-DOM, a typical forward operator based on the discrete ordinate method for simulating radiative transfer processes. Results indicate that NNFO is 15 or 6 times faster than RTTOV-DOM is in serial or parallel modes. The mean difference, RMSE, and mean absolute error of the difference in reflectance simulated by RTTOV-DOM and NNFO (RTTOV-DOM minus NNFO) are 0.001, 0.048, and 0.029, respectively, implying that the simulation accuracies of the two forward operators are comparable. In addition, NNFO is validated using FY-4A/AGRI one-week reflectance observations. The results reveal that the probability density function of the simulation errors conforms to a Gaussian function, with a mean bias and standard deviation of -0.016 and 0.052, respectively.NNFO is comparable with traditional forward operators in terms of accuracy, with a distinguished advantage in computational efficiency. However, the current version of NNFO only supports ensemble Kalman filter methods (including their variants). For four-dimensional variational methods, NNFO should be developed by including its adjoint. Moreover, NNFO could be further improved by including aerosol effects. The improvement of NNFO in the aforementioned aspects and the extension of NNFO to DA applications are ongoing.  
      关键词:Fengyun-4A satellite;Advanced Geostationary Radiation Imager (AGRI);visible reflectance;RTTOV;data assimilation;forward operator;neural network;adaptive learning rate   
      1087
      |
      288
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 64984854 false
      发布时间:2025-01-17
    • 在矿区地表裂缝识别领域,专家设计了Goaf-DTNet双任务卷积神经网络,通过信息互补提高裂缝检测精度,为矿区监测提供有效数据。
      CHEN Ximing, YAO Xin, REN Kaiyu, YAO Chuangchuang, ZHOU Zhenkai, YANG Yilin
      Vol. 28, Issue 12, Pages: 3271-3286(2024) DOI: 10.11834/jrs.20243016
      Dual-task model for ground crack detection in the goaf of coal mines
      摘要:Automatic detection of ground cracks in the goaf of coal mines plays an important role in production security and ecological environment management. Given the complex background, variable geometry, and scale of cracks in coal mines, automatically detecting ground cracks in the goaf remains challenging. Crack extraction can be treated as a segmentation task from a global view or as a skeleton extraction task (boundary detection) from a local view. Many methods view crack extraction as a single task independently. However, these methods cannot produce enough data for the subsequent measurement and quantitative evaluation of the extent of the detriment of ground cracks. Moreover, they disregard the information interaction of different tasks, which could potentially improve accuracy and efficiency. To solve these problems, this study designed a dual-task Convolutional Neural Network (CNN) called dual-task CNN for goaf crack recognition (Goaf-DTNet) to automatically detect ground cracks by using unmanned aerial vehicle imagery with a high spatial resolution.In Goaf-DTNet, an atrous spatial pyramid pooling module is introduced to extract multiscale semantic information. In consideration of the characteristics of ground cracks in the goaf, a Multiscale Feature Fusion Module (MFFM) was designed for the crack segmentation branch to further integrate local and global contextual information. A Segmentation-Guided crack skeleton Feature extraction Module (SGFM) was used in the crack skeleton extraction branch to provide spatial information through the spatial attention mechanism. The proposed dual-task model can explicitly avoid parameter calculation in the shared layers, thus reducing the memory footprint and accelerating each task. Meanwhile, the complementary information from the communication between two tasks can improve the detection accuracy of each task.For the task of skeleton extraction, the F1-score and Intersection over Union (IoU) value are 0.71 and 0.55, respectively; the former is about 1% higher than the F1-score of the second best method, and the latter is 16.35% higher than the IoU of PSPNet. In the task of surface segmentation, the proposed model performs better than the compared methods. Its Optimal Dataset Scale (ODS) and Average Precision (AP) value are 0.56 and 0.54, respectively. In addition, the test results for an open-source dataset show that the proposed method is better than the others, and its F1-score and IoU value are 0.89 and 0.80, respectively. In skeleton extraction, the Optimal Image Scale (OIS), ODS, and AP of the proposed method are 0.58, 0.58, and 0.55, respectively. Ablation experiments are conducted to prove the effectiveness of the proposed MMFM and SGFM. Results indicate that after adding MMFM to the crack segmentation branch, the F1-score and IoU increase by 0.0827 and 0.0918, respectively, showing that the method facilitates crack detection by fusing local and global information. In the crack skeleton extraction task, OIS, ODS, and AP increase by 0.1246, 0.1630, and 0.2140, respectively, after SGFM is embedded into the branch.Experimental results show that Goaf-DTNet is effective in ground crack detection from the goaf of coal mines. The proposed MFFM helps the model obtain complete and continuous crack identification results by integrating multiscale context semantic information. SGFM uses information from the surface segmentation branch to provide abundant spatial information for linear crack features, thus effectively improving the detection accuracy. Furthermore, the accuracy of each task is enhanced by taking advantage of the synergy between two tasks.  
      关键词:remote sensing;Coal mines;goaf;unmanned aerial vehicle (UAV);crack detection;Convolutional Neural Networks (CNN);Multi-task learning;deep learning   
      144
      |
      220
      |
      0
      <HTML>
      <L-PDF><Enhanced-PDF><Meta-XML>
      <引用本文> <批量引用> 64967594 false
      发布时间:2025-01-17
    0