
Posters
Presenting Author Academic/Professional Position
Faculty
Academic Level (Author 1)
Faculty
Discipline/Specialty (Author 1)
Population Health and Biostatistics
Presentation Type
Poster
Discipline Track
Community/Public Health
Statistics
Abstract Type
Research/Clinical
Abstract
Background: With a rapid development of data collection technology, high dimensional data, whose model dimension k may be growing or much larger than the sample size n, is becoming increasingly prevalent in different fields of study, such as ecology, genetics, among others. This data deluge is introducing new challenges to traditional statistical procedures and theories and is thus generating a renewed interest in the problems of variable selection and classification in high dimensional regression models. In large k, small n settings, variable selection is usually the first step for dimension reduction to uncover significant covariates, which contribute to the response variable. The difficulty of high dimensional data analysis mainly comes from its computational burden and inherent limitations of model complexity
Methods: We propose a sparse Bayesian procedure for the problems of variable selection and classification in high dimensional logistic regression models based on the global-local (GL) shrinkage prior framework. Particularly we first consider two types of GL shrinkage priors, the horseshoe (HS) prior and the normal-gamma (NG) prior, for the regression coefficients and then specify a correlated prior for the binary vector to distinguish models with the same size. By using mixture representations of the logistic distribution and the considered GL shrinkage priors, we construct a Bayesian hierarchical modeling, which allows researchers to develop an effective MCMC-based computation algorithm to generate posterior samples for making the posterior inference.
Results: Numerical results from simulation studies and real-data applications demonstrate the effectiveness of the proposed methods in terms of variable selection and prediction.
Conclusions: The proposed HS and NG approach can be deemed as competitive tools to address the variable selection and prediction problems in high dimensional logistic regression models.
Recommended Citation
Ma, Zhuanzhuan, "Sparse Bayesian variable selection using global-local shrinkage priors for the analysis of cancer datasets" (2025). Research Symposium. 104.
https://scholarworks.utrgv.edu/somrs/2025/posters/104
Included in
Applied Statistics Commons, Biostatistics Commons, Statistical Methodology Commons, Statistical Models Commons
Sparse Bayesian variable selection using global-local shrinkage priors for the analysis of cancer datasets
Background: With a rapid development of data collection technology, high dimensional data, whose model dimension k may be growing or much larger than the sample size n, is becoming increasingly prevalent in different fields of study, such as ecology, genetics, among others. This data deluge is introducing new challenges to traditional statistical procedures and theories and is thus generating a renewed interest in the problems of variable selection and classification in high dimensional regression models. In large k, small n settings, variable selection is usually the first step for dimension reduction to uncover significant covariates, which contribute to the response variable. The difficulty of high dimensional data analysis mainly comes from its computational burden and inherent limitations of model complexity
Methods: We propose a sparse Bayesian procedure for the problems of variable selection and classification in high dimensional logistic regression models based on the global-local (GL) shrinkage prior framework. Particularly we first consider two types of GL shrinkage priors, the horseshoe (HS) prior and the normal-gamma (NG) prior, for the regression coefficients and then specify a correlated prior for the binary vector to distinguish models with the same size. By using mixture representations of the logistic distribution and the considered GL shrinkage priors, we construct a Bayesian hierarchical modeling, which allows researchers to develop an effective MCMC-based computation algorithm to generate posterior samples for making the posterior inference.
Results: Numerical results from simulation studies and real-data applications demonstrate the effectiveness of the proposed methods in terms of variable selection and prediction.
Conclusions: The proposed HS and NG approach can be deemed as competitive tools to address the variable selection and prediction problems in high dimensional logistic regression models.