Theses and Dissertations

Date of Award


Document Type


Degree Name

Master of Science (MS)


Applied Statistics and Data Science

First Advisor

Dr. Hansapani Rodrigo

Second Advisor

Dr. George Yanev

Third Advisor

Dr. Tamer Oraby


Gene expression analysis has been of major interest to biostatisticians for many decades. Such studies are necessary for the understanding of disease risk assessment and prediction, so that medical professionals and scientists alike may learn how to better create treatment plans to lessen symptoms and perhaps even find cures. In this study, we will investigate various gene expression analyses and machine learning techniques for disease class prediction, as well as assess predictive validity of these models and uncover differentially expressed (DE) genes for their relevant pathology datasets. Multiple gene expression datasets will be used to test model accuracies and will be obtained using the Affymetrix U133A platform (GPL96).

Significant Analysis of Microarrays (SAM) had been used to identify potential disease biomarkers, followed by these predictive models: (a) random forest, (b) random forest with Gene eXpression Network Analysis (GXNA), (c) RF++, (d) LASSO, and (e) Bayesian Neural Networks. One of the intended goals for this study is to find clusters of co-expressed genes and identify the effect of clustering classification based on knowledge in gene expression data/microarray data. The other goal is to determine the usefulness of Automatic Relevancy Determination in Bayesian neural networks.


Copyright 2021 Myrine A. Barreiro-Arevalo. All Rights Reserved.