Retrieving historical fine particulate matter (PM 2.5) data is key for evaluating the long-term impacts of PM 2.5 on the environment, human health and climate change. Satellite-based aerosol optical depth has been used to estimate PM 2.5, but estimations have largely been undermined by massive missing values, low sampling frequency and weak predictive capability. Here, using a novel feature engineering approach to incorporate spatial effects from meteorological data, we developed a robust LightGBM model that predicts PM 2.5 at an unprecedented predictive capacity on hourly (R 2 = 0.75), daily (R 2 = 0.84), monthly (R 2 = 0.88) and annual (R 2 = 0.87) timescales. By taking advantage of spatial features, our model can also construct hourly gridded networks of PM 2.5. This capability would be further enhanced if meteorological observations from regional stations were incorporated. Our results show that this model has great potential in reconstructing historical PM 2.5 datasets and real-time gridded networks at high spatial-temporal resolutions. The resulting datasets can be assimilated into models to produce long-term re-analysis that incorporates interactions between aerosols and physical processes.
A high-performance machine-learning model incorporating spatial effects was developed to estimate historical PM2.5 concentrations based on meteorological data. Capable of hourly resolution, this dataset will be of great value for understanding PM2.5's long-term climate and environmental effects and producing chemical-weather coupled reanalysis.