Complementary inverse modeling and machine learning techniques for air pollution assessment
Export citation
Abstract
Deterministic AQM estimate the relationship between sources of pollution and their effects on ambient air quality by simulating the evolution over time of pollutant concentrations of pollutant species. However, a key input to deterministic AQM are detailed, spatially and temporally-resolved emission inventories, which are known to carry large uncertainties. On the other hand, recent advances in data science allow for the use of supervised machine learning methods, such as Multivariate Linear Regression Model (MLRM) along socioeconomic historical data to explore the evolution of pollution sources through time. In this work, the use of inverse modeling along matrix-based regularization techniques to improve emission inventories for deterministic air quality modeling applications, and the application of supervised machine learning tools to quantify the contribution of energy and economic factors to air pollution, are explored. An analysis performed on emission inventories developed for the Monterrey Metropolitan area exhibited large differences for recently published criteria pollutant inventories. A literature review on regularization methods showed that these mathematical techniques are increasingly being used in the atmospheric sciences in inverse modeling contexts. As study case, some regularization methods in combination with regularization parameter selection methods (Generalized Cross Validation, L-Curve, and Normalized Cumulative Periodograms), were used with a deterministic photochemical air quality model to compute scaling factors for the correction of a criteria-pollutant emission inventory for Guadalajara Metropolitan Area, Mexico. While model performance improved, results also reflected that regularization methods alone cannot resolve all uncertainties and that incorporating known available data can be useful to better understand pollution sources. MLRM were developed by correlating long-term economic and energy indicators with monthly-averaged air pollution data for the Monterrey, Guadalajara and Mexico City Metropolitan Areas. Although socioeconomic variables do not explain all variance in the observed pollutant concentrations, they allow the identification and analysis of activities with an impact on air quality. Moreover, the resulting MLRM displayed similar statistical performance and compared favorably to other studies found in the literature. It is concluded that both approaches are complementary and can help design relevant public policies.