Robust unsupervised statistical learning for the identification and prediction of the risk profiles
Export citation
Abstract
The discovery of disease subtypes substantially impacts the selection of patient-specific
treatment with implications for long-term survival and disease-related outcomes. Given the
heterogeneity of disease phenotypes and the demand for a clear understanding of the features
associated with the onset of the disease, this discovery of clinically relevant disease subtypes is
not straightforward. Consequently, it is essential for clinical researchers that techniques of disease
subtyping be robust and reproducible in clinical settings. This dissertation aims to provide a simple
clinical tool that predicts the specific disease subtype of a patient. Therefore a robust unsupervised
statistical learning method is presented, developed, and validated that analyzes multidimensional
datasets and returns reproducible, robust unsupervised clustering Models of the identified patient
subtypes. Unsupervised clustering techniques could realistically model disease heterogeneity.
Each cluster represents a distinct homogenous disease subtype discovered through the analysis of
the predicted Class-Co-Association Matrix (PCCAM) created by randomly resampling research
data. Primarily, there is a PCCAM resulting from the test results of replicated random-crossvalidation of unsupervised clustering that depicts the joint probability of subjects-pairs belonging
to the same cluster; thus, PCCAM can result in the discovery of all the reproducible clusters present
in the studied data. We applied the proposed methodology to various diseases to discover subtypes
such as Alzheimer's disease, Covid-19, and acute myeloid leukemia cancer with different data
types. Our findings showed the proposed unsupervised approach could discover the subtypes of
disease with statistical differences. Also, the characterization of discovered subgroups indicated other
substantial differences in some features we considered studying amongst subgroups.