A PM2.5 prediction model based on deep learning and random forest
- Vol. 27, Issue 2, Pages: 430-440(2023)
Received:20 November 2020,
Published:07 February 2023
DOI: 10.11834/jrs.20210504
移动端阅览


浏览全部资源
扫码关注微信
Received:20 November 2020,
Published:07 February 2023
移动端阅览
针对PM
2.5
浓度预测中传统机器学习算法无法对数据内部隐藏特征进行深层次挖掘,而深度学习算法在数据较少情况下效果不佳的问题,综合考虑深度学习与随机森林的特点,提出一种基于深度学习与随机森林的PM
2.5
浓度预测组合模型。模型以气溶胶光学厚度(AOD)遥感数据、气象再分析数据和PM
2.5
地面观测数据构建训练数据集,通过深度学习方法对训练数据内部深层次隐含特征进行提取,将提取得到的隐含特征用于随机森林模型训练,并使用随机森林回归算法得到PM
2.5
浓度的预测值。为验证方法的有效性,以河南省区域2018年—2019年的PM
2.5
浓度估算为例,将原始特征与利用CNN、LSTM和CNN_LSTM所提取特征共同构建的新特征分别通过随机森林回归、支持向量回归以及K近邻回归等3种传统机器学习方法进行训练和预测。实验结果表明,在较少数据情况下PMCOM模型无论是在整体预测还是在分季节预测场景下均具有较好的预测精度,其中以LSTM为特征选择器,RF为回归器的组合模型是本实验的最优模型,在即使只有35%的数据作为训练样本时,整体预测实验中
R
2
仍可达0.89,各季节预测实验中
R
2
均在0.75以上。
At present
the situation of environmental pollution in China is grim
among which regional compound air pollution dominated by PM
2.5
is the most prominent. Aerosol Optical Depth (AOD) is a key physical quantity used to characterize the degree of atmospheric turbidity
which represents the intensity of aerosol light reduction. Many studies have shown that there is a strong correlation between AOD and PM
2.5
. Using the AOD data obtained by satellite remote sensing combined with other influencing factors to analyze the change mechanism of PM
2.5
is of great significance to air pollution prevention and the protection of human health.
The diffusion of PM
2.5
is an extremely complicated process
and the PM
2.5
prediction model based on the statistical regression method can only describe a relatively simple nonlinear relationship. However
the estimation of PM
2.5
is considered to be a more complex multivariable nonlinear problem. Compared with statistical regression models
the PM
2.5
prediction model based on traditional machine learning algorithms can deal with more complex nonlinear problems. However
its ability to process historical data is still limited
so it is difficult to mine the variation law of pollutant concentrations from the perspective of big data. Compared with the traditional machine learning method
the models based on deep learning can dig deep features hidden in historical data. However
the AOD remote sensing data are affected by image time resolution and pixel cloud pollution
which will greatly reduce the effective data. Because the construction of a deep learning method depends on a large amount of training data
less training data will seriously affect the model accuracy.
Aiming at the problem that the traditional machine learning algorithm cannot deeply mine the hidden association features in data and the deep learning algorithm has a poor effect under the condition of less data
a combined model of PM
2.5
prediction based on deep learning and random forest is proposed. The model builds a training dataset with AOD remote sensing data
meteorological reanalysis data and PM
2.5
ground observation data. The deep hidden features in the training data are extracted by the powerful feature extraction ability of the deep learning model first. Then
the extracted hidden features are used in the training of the random forest model
and the predicted value of PM
2.5
concentration is obtained by the random forest regression algorithm.
To verify the effectiveness of this method
a series of experiments were carried out. The results demonstrate that PMCOM has better prediction accuracy in both overall prediction and seasonal prediction scenarios. The combination of random forest and long- and short-term memory neural networks is the best for this experiment. Even when only 35% of the data are used for training
R
2
in the overall prediction experiment can reach 0.89
and R
2
in each season prediction experiment is also above 0.75.
The combination of deep learning and random forest can reduce the dependence of deep learning models on the amount of data by random forest and make full use of the high-level hidden features of existing historical data. In this way
it makes up for the deficiency of mining the internal associated features of data by a random forest model and improves the prediction accuracy of PM
2.5
concentration.
Breiman L . 1996 . Bagging predictors . Machine Learning , 24 ( 2 ): 123 - 140 [ DOI: 10.1007/BF00058655 http://dx.doi.org/10.1007/BF00058655 ]
Daryanoosh S M , Goudarzi G , Mohammadi M J , Armin H , Khaniabadi Y O and Sadeghi S . 2017 . Exposure to particulate matter and its health impacts an AirQ approach . Archives of Hygiene Science , 6 ( 1 ): 88 - 95 [ DOI: 10.29252/ArchHygSci.6.1.88 http://dx.doi.org/10.29252/ArchHygSci.6.1.88 ]
Du X , Feng J Y , Lv S Q and Shi W . 2017 . PM 2.5 concentration prediction model based on random forest regression analysis . Telecommunications Science , 33 ( 7 ): 66 - 75
杜续 , 冯景瑜 , 吕少卿 , 石薇 . 2017 . 基于随机森林回归分析的PM 2.5 浓度预测模型 . 电信科学 , 33 ( 7 ): 66 - 75 [ DOI: 10.11959/j.issn.1000-0801.2017211 http://dx.doi.org/10.11959/j.issn.1000-0801.2017211 ]
Duan J X , Zhai W X , Cheng C Q and Chen B . 2018 . Socio-economic factors influencing the spatial distribution of PM 2.5 concentrations in China: an exploratory analysis . Environmental Science , 39 ( 5 ): 2498 - 2504
段杰雄 , 翟卫欣 , 程承旗 , 陈波 . 2018 . 中国PM 2.5 污染空间分布的社会经济影响因素分析 . 环境科学 , 39 ( 5 ): 2498 - 2504 [ DOI: 10.13227 http://dx.doi.org/10.13227 ∕j.hjkx. 201709087 ]
Engel-Cox J A , Holloman C H , Coutant B W and Hoff R M . 2004 . Qualitative and quantitative evaluation of MODIS satellite sensor data for regional and urban scale air quality . Atmospheric Environment , 38 ( 16 ): 2495 - 2509 [ DOI: 10.1016/j.atmosenv.2004.01.039 http://dx.doi.org/10.1016/j.atmosenv.2004.01.039 ]
Geng G N , Meng X , He K B and Liu Y . 2020 . Random forest models for PM 2.5 speciation concentrations using MISR fractional AODs . Environmental Research Letters , 15 ( 3 ): 034056 [ DOI: 10.1088/1748-9326/ab76df http://dx.doi.org/10.1088/1748-9326/ab76df ]
Gers F A , Schraudolph N N , and Schmidhuber J . 2003 . Learning precise timing with LSTM recurrent networks . The Journal of Machine Learning Research , 3 ( 1 ): 115 - 143 [ DOI: 10.1162/153244303768966139 http://dx.doi.org/10.1162/153244303768966139 ]
Guo J P , Xia F , Zhang Y , Liu H , Li J , Lou M Y , He J , Yan Y , Wang F , Min M and Zhai P M . 2017 . Impact of diurnal variability and meteorological factors on the PM 2.5 -AOD relationship: Implications for PM 2.5 remote sensing . Environmental Pollution , 221 : 94 - 104 [ DOI: 10.1016/j.envpol.2016.11.043 http://dx.doi.org/10.1016/j.envpol.2016.11.043 ]
Ho T K . 1998 . The random subspace method for constructing decision forests . IEEE Transactions on Pattern Analysis and Machine Intelligence , 20 ( 8 ): 832 - 844 [ DOI: 10.1109/34.709601 http://dx.doi.org/10.1109/34.709601 ]
Huang B , Wu B and Barry M . 2010 . Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices . International Journal of Geographical Information Science , 24 ( 3 ): 383 - 401 [ DOI: 10.1080/13658810802672469 http://dx.doi.org/10.1080/13658810802672469 ]
Huang C J and Kuo P H . 2018 . A deep CNN-LSTM model for particulate matter (PM 2.5 ) forecasting in smart cities . Sensors , 18 ( 7 ): 2220 [ DOI: 10.3390/s18072220 http://dx.doi.org/10.3390/s18072220 ]
Huang J , Zhang F , Du Z H , Liu R Y and Cao X P . 2019 . Hourly concentration prediction of PM 2.5 based on RNN-CNN ensemble deep learning model . Journal of Zhejiang University (Science Edition) , 46 ( 3 ): 370 - 379
黄婕 , 张丰 , 杜震洪 , 刘仁义 , 曹晓裴 . 2018 . 基于RNN-CNN集成深度学习模型的PM 2.5 小时浓度预测 . 浙江大学学报(理学版) , 46 ( 3 ): 370 - 379 [ DOI: 10.3785/j.issn.1008-9497.2019.03.016 http://dx.doi.org/10.3785/j.issn.1008-9497.2019.03.016 ]
Jiang M , Sun W W , Yang G and Zhang D F . 2017 . Modelling seasonal GWR of daily PM 2.5 with proper auxiliary variables for the Yangtze River delta . Remote Sensing , 9 ( 4 ): 346 [ DOI: 10.3390/rs9040346 http://dx.doi.org/10.3390/rs9040346 ]
Jiao L M , Xu G , Zhao S L , Ma M , Dong T and Li J Y . 2015 . LUR-based simulation of the spatial distribution of PM 2.5 of Wuhan . Geomatics and Information Science of Wuhan University , 40 ( 8 ): 1088 - 1094
焦利民 , 许刚 , 赵素丽 , 马明 , 董婷 , 李江月 . 2015 . 基于LUR的武汉市PM 2.5 浓度空间分布模拟 . 武汉大学学报(信息科学版) , 40 ( 8 ): 1088 - 1094 [ DOI: 10.13203/j.whugis20130785 http://dx.doi.org/10.13203/j.whugis20130785 ]
LeCun Y , Bottou L , Bengio Y and Haffner P . 1998 . Gradient-based learning applied to document recognition . Proceedings of the IEEE , 86 ( 11 ): 2278 - 2324 [ DOI: 10.1109/5.726791 http://dx.doi.org/10.1109/5.726791 ]
Li S S , Chen L F , Xiong X Z , Tao J H , Su L , Han D and Liu Y . 2013 . Retrieval of the haze optical thickness in North China Plain using MODIS Data . IEEE Transactions on Geoscience and Remote Sensing , 51 ( 5 ): 2528 - 2540 [ DOI: 10.1109/TGRS.2012.2214038 http://dx.doi.org/10.1109/TGRS.2012.2214038 ]
Lipton Z C , Berkowitz J and Elkan C . 2015 . A critical review of recurrent neural networks for sequence learning. arXiv: 1506 . 00019
Liu H N , Zhu Y , Lin H J and Wang X Y . 2015 . Observation and analysis of haze characteristics in Suzhou based on automatic station data . China Environmental Science , 35 ( 3 ): 668 - 675
刘红年 , 朱焱 , 林惠娟 , 王学远 . 2015 . 基于自动站资料的苏州灰霾天气分析 . 中国环境科学 , 35 ( 3 ): 668 - 675
Liu L Y , Zhang Y J , Li Y S , Liu X Y and Wan Y . 2020 . PM 2.5 inversion using remote sensing data in Eastern China based on deep learning . Environmental Science , 41 ( 4 ): 1513 - 1519
刘林钰 , 张永军 , 李彦胜 , 刘欣怡 , 万一 . 2020 . 基于深度学习的华东地区PM 2.5 浓度遥感反演 . 环境科学 , 41 ( 4 ): 1513 - 1519 [ DOI: 10.13227/j.hjkx.201909209 http://dx.doi.org/10.13227/j.hjkx.201909209 ]
Ma Z W , Hu X F , Sayer A M , Levy R , Zhang Q , Xue Y G , Tong S L , Bi J , Huang L and Liu Y . 2016 . Satellite-based spatiotemporal trends in PM 2.5 concentrations: China , 2004 - 2013 . Environmental Health Perspectives , 124 ( 2 ): 184 - 192 [ DOI: 10.1289/ehp.1409481 http://dx.doi.org/10.1289/ehp.1409481 ]
Qin D M , Ding Z J , Jin Y P and Zhao Q . 2019 . An air pollutant prediction model based on auto-encoder network . Journal of Tongji University (Natural Science) , 47 ( 5 ): 681 - 687
秦东明 , 丁志军 , 金玉鹏 , 赵勤 . 2019 . 基于自编码网络的空气污染物浓度预测 . 同济大学学报(自然科学版) , 47 ( 5 ): 681 - 687 [ DOI: 10.11908/j.issn.0253-374x.2019.05.013 http://dx.doi.org/10.11908/j.issn.0253-374x.2019.05.013 ]
Qu Y , Qian X , Song H Q , He J , Li J H and Xiu H . 2019 . Machine-learning-based model and simulation analysis of PM 2.5 concentration prediction in Beijing . Chinese Journal of Engineering , 41 ( 3 ): 401 - 407
曲悦 , 钱旭 , 宋洪庆 , 何杰 , 李剑辉 , 修昊 . 2019 . 基于机器学习的北京市PM 2.5 浓度预测模型及模拟分析 . 工程科学学报 , 41 ( 3 ): 401 - 407 [ DOI: 0.13374/i.issn2095-9389.2019.03.01 http://dx.doi.org/0.13374/i.issn2095-9389.2019.03.01 ]
Rumelhart D E , Hinton G E and Williams R J . 1986 . Learning representations by back-propagating errors . Nature , 323 ( 6088 ): 533 - 536 [ DOI: 10.1038/323533a0 http://dx.doi.org/10.1038/323533a0 ]
Shen H F , Zhou M , Li T W and Zeng C . 2019 . Integration of remote sensing and social sensing data in a deep learning framework for hourly urban PM 2.5 mapping . International Journal of Environmental Research and Public Health , 16 ( 21 ): 4102 [ DOI: 10.3390/ijerph16214102 http://dx.doi.org/10.3390/ijerph16214102 ]
Shen Y , Chen C L , Qian J and Liu J . 2018 . High resolution PM 2.5 estimation using remote sensing data based on random forest—a case study of Guangdong, China . Journal of Integration Technology , 7 ( 3 ): 31 - 41
申原 , 陈朝亮 , 钱静 , 刘军 . 2018 . 基于随机森林的高分辨率PM 2.5 遥感反演——以广东省为例 . 集成技术 , 7 ( 3 ): 31 - 41 [ DOI: 10.3969/j.issn.2095-3135.2018.03.004 http://dx.doi.org/10.3969/j.issn.2095-3135.2018.03.004 ]
Wang Z B , Fang C L , Xu G and Pan Y P . 2015 . Spatial-temporal characteristics of the PM 2.5 in China in 2014 . Acta Geographica Sinica , 70 ( 11 ): 1720 - 1734
王振波 , 方创琳 , 许光 , 潘月鹏 . 2015 . 2014年中国城市PM 2.5 浓度的时空变化规律 . 地理学报 , 70 ( 11 ): 1720 - 1734 [ DOI: 10.11821/dlxb201511003 http://dx.doi.org/10.11821/dlxb201511003 ]
Xia X G , Chen H B , Li Z Q , Wang P C and Wang J K . 2007 . Significant reduction of surface solar irradiance induced by aerosols in a suburban region in northeastern China . Journal of Geophysical Research , 112 ( D22 ): D22 S 02 [ DOI: 10.1029/2006JD007562 http://dx.doi.org/10.1029/2006JD007562 ]
Xia X S , Chen J J , Wang J J and Cheng X F . 2020 . PM 2.5 concentration influencing factors in China based on the random forest model . Environmental Science , 41 ( 5 ): 2057 - 2065
夏晓圣 , 陈菁菁 , 王佳佳 , 程先富 . 2020 . 基于随机森林模型的中国PM 2.5 浓度影响因素分析 . 环境科学 , 41 ( 5 ): 2057 - 2065 [ DOI: 10.13227/j.hjkx.201910126 http://dx.doi.org/10.13227/j.hjkx.201910126 ]
Xiang S L , Liu J F , Tao W , Yi K , Xu J Y , Hu X R , Liu H Z , Wang Y Q , Zhang Y Z , Yang H Z , Hu J Y , Wan Y , Wang X J , Ma J M , Wang X L and Tao S . 2020 . Control of both PM 2.5 and O 3 in Beijing-Tianjin-Hebei and the surrounding areas . Atmospheric Environment , 224 : 117259 [ DOI: 10.1016/j.atmosenv.2020.117259 http://dx.doi.org/10.1016/j.atmosenv.2020.117259 ]
Xiao Q Y , Wang Y J , Chang H H , Meng X , Geng G N , Lyapustin A and Liu Y , 2017 . Full-coverage high-resolution daily PM 2.5 estimation using MAIAC AOD in the Yangtze River Delta of China . Remote Sensing of Environment , 199 : 437 - 446 [ DOI: 10.1016/j.rse.2017.07.023 http://dx.doi.org/10.1016/j.rse.2017.07.023 ]
Xie H F , Ji L , Wang Q and Jia Z J . 2019 . Research of PM 2.5 prediction system based on CNNs-GRU in Wuxi urban area . IOP Conference Series: Earth and Environmental Science , 300 ( 3 ): 032073 [ DOI: 10.1088/1755-1315/300/3/032073 http://dx.doi.org/10.1088/1755-1315/300/3/032073 ]
Yu D H , Zhang B M , Zhao C , Guo H T and Lu J . 2020 . Scene classification of remote sensing image using ensemble convolutional neural network . Journal of Remote Sensing , 24 ( 6 ): 717 - 727
余东行 , 张保明 , 赵传 , 郭海涛 , 卢俊 . 2020 . 联合卷积神经网络与集成学习的遥感影像场景分类 . 遥感学报 , 24 ( 6 ): 717 - 727 [ DOI: 10.11834/jrs.20208273 http://dx.doi.org/10.11834/jrs.20208273 ]
Zhang C J , Dai L J and Ma L M . 2017 . Dynamic model for forecasting concentration of PM 2.5 one hour in advance using support vector machine . Infrared and Laser Engineering , 46 ( 2 ): 226002
张长江 , 戴李杰 , 马雷鸣 . 2017 . 应用SVM的PM 2.5 未来一小时浓度动态预报模型 . 红外与激光工程 , 46 ( 2 ): 226002 [ DOI: 10.3788/IRLA201746.0226002 http://dx.doi.org/10.3788/IRLA201746.0226002 ]
Zhao W F , Lin R S , Tang W and Zhou Y . 2019 . Forecasting model of short-term PM 2.5 concentration based on deep learning . Journal of Nanjing Normal University (Natural Science Edition) , 42 ( 3 ): 32 - 41
赵文芳 , 林润生 , 唐伟 , 周勇 . 2019 . 基于深度学习的PM 2.5 短期预测模型 . 南京师大学报(自然科学版) , 42 ( 3 ): 32 - 41 [ DOI: 10.3969/j.issn.1001-4616.2019.03.005 http://dx.doi.org/10.3969/j.issn.1001-4616.2019.03.005 ]
相关文章
相关作者
相关机构
京公网安备11010802024621