Machine learning analysis of antiretroviral procurement strategies in the Mexican government

Citation
Share
Date
Abstract
This dissertation investigates trends in antiretroviral medication (ARV) prices and their impact on public health in Mexico during 2019. The study leverages a dataset comprising 15,220 procurement records collected between 2016 and 2019 to analyze price fluctuations and predict their implications for healthcare systems. Using machine learning models developed in Python-Logistic Regression, Ramdom Forest, and K-Nearest Neighbors (KNN)-this research identifies patterns of increasing and decreasing prices and the factors influencing these trends. The data preprocessing phase involved extensive cleaning, imputation of missing values, feature scaling, and one-hot encoding to handle categorical variables. The dataset was partitioned into training and testing sets using an 80/20 split, ensuring robust validation. Hyperparameter optimization techniques, including grid search and cross-validation, were applied to enhance model performance. The integration of ensemble methods, as exemplified by Ramdom Forest, enabled the capture of complex, non-linear relationships between variables, a critical advantage over simpler models. KNN provided complementary insights into local price clusters, while Logistic Regression offered interpretable coefficients for key predictors. In addition to predictive modeling, the study incorporates a financial evaluation of ARV price fluctuations, estimating the budgetary impact on public health systems. Consolidated purchasing schemes were found to yield significant cost reductions, enhancing access to ARVs for individuals living with HIV/AIDS. A unified ARV pricing database was developed, integrating fragmented data from government procurement systems, ensuring transparency and facilitating reproducibility in future research. This research underscores the transformative potential of data-driven approaches in optimizing pharmaceutical procurement. It highlights the necessity of leveraging machine learning techniques not only for predictive analytics but also for informed decision-making in public health policy.