面向不平衡高光谱遥感分类的SMOTE和旋转森林动态集成算法
Dynamic ensemble algorithm of SMOTE and rotation forest for imbalanced hyperspectral remote sensing classification
- 2022年26卷第11期 页码:2369-2381
收稿:2020-06-27,
纸质出版:2022-11-07
DOI: 10.11834/jrs.20210216
移动端阅览
收稿:2020-06-27,
纸质出版:2022-11-07
移动端阅览
旋转森林RoF(Rotation Forest)是一种功能强大的集成分类器,它在高光谱图像分类中已经获得了很多成功的应用。然而,现实数据经常存在类别不平衡的问题,这使得传统的RoF算法侧重识别多数类别的样本,而忽略了少数类样本的分类精度。SMOTE(Synthetic Minority Oversampling Technique)算法通过模拟生成新样本的方式来增加少数类别样本的数量,进而达到平衡数据集类别的效果;但是SMOTE算法目前主要被用于数据预处理阶段,并且在处理多类问题时具有增加人工噪声的风险。为了解决高光谱数据学习中的多类不平衡问题,本文提出了一个新的SMOTE和RoF动态集成算法;该算法利用动态采样因子技术,将类别分布优化和基分类器训练过程进行融合。本实验利用Indian Pines、Salinas以及Pavia University这3个公开的高光谱数据对新的SMOTE和RoF动态集成算法的性能进行测试,同时选取4种对比算法,包括随机森林、传统的RoF以及通过随机过采样和SMOTE数据预处理后的RoF算法,并且采用总体分类精度、平均分类精度、F-measure、Gmean、最小召回率、集成分类器多样性、模型训练时间以及McNemar测试等为算法性能评价标准。实验结果表明本文方法具有明显的分类优势,可以保证在增加数据总体分类精度的基础上提高小类别样本的识别精度。
Rotation Forest (RoF)
a powerful ensemble classifier
has obtained many successful applications in hyperspectral image classification. However
the data often has the problem of class imbalance. Consequently
the traditional RoF algorithm focuses on identifying the classes with majority samples
ignoring the accuracy of minority samples. The SMOTE (Synthetic Minority Oversampling Technique) algorithm increases the number of minority samples by simulating the way of generating new samples
thereby achieving the effect of balancing the categories of the data set. However
the SMOTE algorithm is mainly used in the data preprocessing stage and has the risk of increasing artificial noise when dealing with multi-class problems. Therefore
a novel dynamic ensemble algorithm based on SMOTE and RoF is proposed in this work to increase the classification accuracy of the multi-class imbalanced hyperspectral data. The proposed algorithm uses a dynamic sampling factor technology to merge the class distribution optimization with the base classifier. This algorithm not only realizes the adaptive generation of class balance data set but also reduces the influence of noise on the base classifier. In this experiment
three public hyperspectral images are used to test the performance of the algorithm
They are Indian Pines
Salinas and Pavia University. Four comparison algorithms are also selected
including random forest
traditional RoF
RoF algorithm with random oversampling
and SMOTE data preprocessing. The overall accuracy
average accuracy
F-measure
Gmean
minimum recall rate
ensemble classifier diversity
model training time
and McNemar test are the algorithm evaluation criteria. The experimental results demonstrate the effectiveness of the proposed method. The novel method not only obtains obvious classification advantages but also increases the recognition accuracy of minority samples while maintaining the overall classification accuracy of the data.
Arshad A , Riaz S and Jiao L C . 2019 . Semi-supervised deep fuzzy C-mean clustering for imbalanced multi-class classification . IEEE Access , 7 : 28100 - 28112 [ DOI: 10.1109/ACCESS.2019.2901860 http://dx.doi.org/10.1109/ACCESS.2019.2901860 ]
Bandara A , Hettiarachchi Y , Hettiarachchi K , Munasinghe S , Wijesinghe I , Kusal H , Sidath M , Ishara W and Thayasivam U . 2020 . A generalized ensemble machine learning approach for landslide susceptibility modeling // Sharma N, Chakrabarti A and Balas V E eds . Data Management, Analytics and Innovation. Singapore : Springer , 1016 : 71 - 93 [ DOI: 10.1007/978-981-13-9364-8_6 http://dx.doi.org/10.1007/978-981-13-9364-8_6 ]
Bhagat R C and Patil S S . 2015 . Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest // 2015 IEEE International Advance Computing Conference (IACC) . Banglore : IEEE [ DOI: 10.1109/IADCC.2015.7154739 http://dx.doi.org/10.1109/IADCC.2015.7154739 ]
Breiman L . 2001 . Random Forests . Machine Learning , 45 : 5 - 32 [ DOI: 10.1023/A:1010933404324 http://dx.doi.org/10.1023/A:1010933404324 ]
Cai L and Zhang G . 2019 . Hyperspectral image classification with imbalanced data based on oversampling and convolutional neural network // Proceedings of SPIE 11342 , AOPC 2019 : AI in Optics and Photonics. Beijing: SPIE: 11342 [DOI: 10.1117/12.2543458]
Cai Z X , Wang X Y , Xu J and Jing L P . 2019 . Sample adaptive classifier for imbalanced data . Computer Science , 46 ( 1 ): 94 - 99
才子昕 , 王馨月 , 徐剑 , 景丽萍 . 2019 . 样本自适应的不平衡分类器 . 计算机科学 , 46 ( 1 ): 94 - 99 [ DOI: 10.11896/j.issn.1002-137X.2019.01.014 http://dx.doi.org/10.11896/j.issn.1002-137X.2019.01.014 ]
Díez-Pastor J F , Rodríguez J J , García-Osorio C I and Kuncheva L I . 2015 . Diversity techniques improve the performance of the best imbalance learning ensembles . Information Sciences , 325 : 98 - 117 [ DOI: 10.1016/j.ins.2015.07.025 http://dx.doi.org/10.1016/j.ins.2015.07.025 ]
Douzas G , Bacao F and Last F . 2018 . Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE . Information Sciences , 465 : 1 - 20 [ DOI: 10.1016/j.ins.2018.06.056 http://dx.doi.org/10.1016/j.ins.2018.06.056 ]
Du P J , Xia J S , Xue Z H , Tan K , Su H J and Bao R . 2016 . Review of hyperspectral remote sensing image classification . Journal of Remote Sensing , 20 ( 2 ): 236 - 256
杜培军 , 夏俊士 , 薛朝辉 , 谭琨 , 苏红军 , 鲍蕊 . 2016 . 高光谱遥感影像分类研究进展 . 遥感学报 , 20 ( 2 ): 236 - 256 [ DOI: 10.11834/jrs.20165022 http://dx.doi.org/10.11834/jrs.20165022 ]
Elreedy D and Atiya A F . 2019 . A comprehensive analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance . Information Sciences , 505 : 32 - 64 [ DOI: 10.1016/j.ins.2019.07.070 http://dx.doi.org/10.1016/j.ins.2019.07.070 ]
Feng W and Bao W X . 2017 . Weight-based rotation forest for hyperspectral image classification . IEEE Geoscience and Remote Sensing Letters , 14 ( 11 ): 2167 - 2171 [ DOI: 10.1109/LGRS.2017.2757043 http://dx.doi.org/10.1109/LGRS.2017.2757043 ]
Feng W , Boukir S and Huang W . 2019a . Margin-based random forest for imbalanced land cover classification // 2019 IEEE International Geoscience and Remote Sensing Symposium . Yokohama : IEEE: 3085 - 3088 [ DOI: 10.1109/IGARSS.2019.8898652 http://dx.doi.org/10.1109/IGARSS.2019.8898652 ]
Feng W , Dauphin G , Huang W J , Quan Y H and Liao W Z . 2019b . New margin-based subsampling iterative technique in modified random forests for classification . Knowledge-Based Systems , 182 : 104845 [ DOI: 10.1016/j.knosys.2019.07.016 http://dx.doi.org/10.1016/j.knosys.2019.07.016 ]
Feng W , Huang W J and Ren J C . 2018 . Class imbalance ensemble learning based on the margin theory . Applied Sciences , 8 ( 5 ): 815 [ DOI: 10.3390/app8050815 http://dx.doi.org/10.3390/app8050815 ]
Gao L R , Zhang B , Zhang X and Shen X . 2007 . Study on the method for estimating the noise in remote sensing images based on local standard deviations . Journal of Remote Sensing , 2007,11, ( 2 ): 201 - 208
高连如 , 张兵 , 张霞 , 申茜 . 2007 . 基于局部标准差的遥感图像噪声评估方法研究 . 遥感学报 , ( 2 ): 201 - 208 [ DOI: 10.11834/jrs.20070227 http://dx.doi.org/10.11834/jrs.20070227 ]
García S , Zhang Z L , Altalhi A , Alshomrani S and Herrera F . 2018 . Dynamic ensemble selection for multi-class imbalanced datasets . Information Sciences , 445 - 446 : 22 - 37 [ DOI: 10.1016/j.ins.2018.03.002 http://dx.doi.org/10.1016/j.ins.2018.03.002 ]
Ghosh D and Cabrera J . Enriched random forest for high dimensional genomic data . IEEE/ACM Transactions on Computational Biology and Bioinformatics , 1 - 1 [ DOI: 10.1109/TCBB.2021.3089417 http://dx.doi.org/10.1109/TCBB.2021.3089417 ]
Han Z , Gao L R , Zhang B , Sun X and Li Q T . 2020 . Nonlinear hyperspectral unmixing algorithm based on deep autoencoder networks . Journal of Remote Sensing , 24 ( 4 ): 388 - 400
韩竹 , 高连如 , 张兵 , 孙旭 , 李庆亭 . 2020 . 高分五号高光谱图像自编码网络非线性解混 . 遥感学报 , 24 ( 4 ): 388 - 400 [ DOI: 10.11834/jrs.20209188 http://dx.doi.org/10.11834/jrs.20209188 ]
Jimenez-Castaño C , Alvarez-Meza A and Orozco-Gutierrez A . 2020 . Enhanced automatic twin support vector machine for imbalanced data classification . Pattern Recognition , 107 : 107442 [ DOI: 10.1016/j.patcog.2020.107442 http://dx.doi.org/10.1016/j.patcog.2020.107442 ]
Krawczyk B . 2016 . Learning from imbalanced data: open challenges and future directions . Progress in Artificial Intelligence , 5 ( 4 ): 221 - 232 [ DOI: 10.1007/s13748-016-0094-0 http://dx.doi.org/10.1007/s13748-016-0094-0 ]
Mullick S S , Datta S , Dhekane S G and Das S . 2020 . Appropriateness of performance indices for imbalanced data classification: an analysis . Pattern Recognition , 102 : 107197 [ DOI: 10.1016/j.patcog.2020.107197 http://dx.doi.org/10.1016/j.patcog.2020.107197 ]
Pan T T , Zhao J H , Wu W and Yang J . 2020 . Learning imbalanced datasets based on SMOTE and Gaussian distribution . Information Sciences , 512 : 1214 - 1233 [ DOI: 10.1016/j.ins.2019.10.048 http://dx.doi.org/10.1016/j.ins.2019.10.048 ]
Rodríguez J J , Díez-Pastor J F , Arnaiz-González Á and Kuncheva L I . 2020 . Random Balance ensembles for multiclass imbalance learning . Knowledge-Based Systems , 193 : 105434 [ DOI: 10.1016/j.knosys.2019.105434 http://dx.doi.org/10.1016/j.knosys.2019.105434 ]
Rodriguez J J , Kuncheva L I and Alonso C J . 2006 . Rotation forest: a new classifier ensemble method . IEEE Transactions on Pattern Analysis and Machine Intelligence , 28 ( 10 ): 1619 - 1630 [ DOI: 10.1109/TPAMI.2006.211 http://dx.doi.org/10.1109/TPAMI.2006.211 ]
Tu X , Shen X B , Fu P , Wang T , Sun Q S and Ji Z X . 2020 . Discriminant sub-dictionary learning with adaptive multiscale superpixel representation for hyperspectral image classification . Neurocomputing , 409 : 131 - 145 [ DOI: 10.1016/j.neucom.2020.05.082 http://dx.doi.org/10.1016/j.neucom.2020.05.082 ]
Zhang Y Q , Lu R Z , Qiao S J , Han N , Gutierrez L A and Zhou J L . 2020 . A sampling method of imbalanced data based on sample space . Acta Automatica Sinica , 1 - 14
张永清 , 卢荣钊 , 乔少杰 , 韩楠 , Gutierrez L A , 周激流 . 2020 . 一种基于样本空间的类别不平衡数据采样方法 . 自动化学报 , 1 - 14 [ DOI: 10.16383/j.aas.c200034 http://dx.doi.org/10.16383/j.aas.c200034 ]
Zhou G and Guo F L . 2019 . Research on sampling diversity method in ensemble learning base on margin // 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI) . Taiyuan : Shanxi University of Finance and Economics and hosted by AEIC Academic Exchange Center : 316 - 319 [ DOI: 10.1109/MLBDBI48998.2019.00071 http://dx.doi.org/10.1109/MLBDBI48998.2019.00071 ]
Zhou S , Sun L J , Xing W , Feng G J , Ji Y M , Yang J and Liu S C . 2020 . Hyperspectral imaging of beet seed germination prediction . Infrared Physics & Technology , 108 : 10336 [ DOI: 10.1016/j.infrared.2020.103363 http://dx.doi.org/10.1016/j.infrared.2020.103363 ]
相关作者
相关机构
京公网安备11010802024621
