School of Mathematical and Statistical Sciences Faculty Publications and Presentations
Document Type
Article
Publication Date
2021
Abstract
Background
Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed.
Material and methods
We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes.
Results
With a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful.
Conclusions
These proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.
Recommended Citation
Fofana, D., George, E.O. & Bowman, D. Combining assumptions and graphical network into gene expression data analysis. J Stat Distrib App 8, 9 (2021). https://doi.org/10.1186/s40488-021-00126-z
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication Title
Journal of Statistical Distributions and Applications
DOI
10.1186/s40488-021-00126-z
Comments
Copyright © 2021, The Author(s)