Prognosis using Deep Learning in CoViD-19 patients

Guadiana Álvarez, José Luis

View/Open

Trabajo Terminal Especialidad Size (2.543Mb)

Acta de Grado y Declaración de Autoria Size (884.8Kb)

Carta de autorización Size (220.6Kb)

Author

Guadiana Álvarez, José Luis

Metadata

Show full item record

Export citation

Abstract

Prognostics study the prediction of an event before it happens, to enable efficient critical decision making. Over the past few years, it has gained a lot of research attention in many fields, i.e. manufacture, economics, and medicine. Particularly in medicine, prognostics are very useful for front line physicians to predict how a disease may affect a patient and react accordingly to save as many lives as possible. One clear example is the recently discovered Coronavirus Disease 2019 (CoViD-19). Because of its novelty, not nearly enough is known about the virus’ behaviour and Key Performance Indicators (KPIs) to asses a mortality prediction. However, using a lot of complex and expensive medical biomarkers could be impossible for many low budget hospitals. This motivates the development of a prediction model that not only maximizes performance, but does so using the least amount of biomarkers possible. For mortality risk prediction, falsely assuming that a patient has a low mortality risk is far more critical than the opposite. Therefore, false negative predictions should be prioritized over false positive ones. This research project proposes a CoViD-19 mortality risk calculator based on a Deep Learning model trained on a data set provided by the HM Hospitales from Madrid, Spain. A pre-processing strategy for unbalanced classes and feature selection is proposed. Benefit of using over-sampling and imputation techniques is evaluated. Also, an imputation method based on the K-Nearest Neighbor (KNN) algorithm for biomarker data is is proposed and its efficiency is evaluated. Results are compared against a Random Forest (RF) model while showing the trade-off between feature input space and the number of samples available. Results on the MPCD score show the proposed DL outperforms the proposed RF on every data set when evaluating even with an over-sampling technique. Finally, the proposed KNN method proves beneficial for data imputation, improving the model’s Recall score from 0:87 to 0:90.

URI

https://hdl.handle.net/11285/644479

Collections

Ciencias Exactas y Ciencias de la Salud 5426

Except where otherwise noted, this item's license is described as openAccess