Theses and Dissertations

Date of Award


Document Type


Degree Name

Master of Science (MS)


Applied Statistics and Data Science

First Advisor

Dr. Kristina Vatcheva

Second Advisor

Dr. Santanu Chakraborty

Third Advisor

Dr. Tamer Oraby


Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Therefore, understanding hospital LOS variability is always an important healthcare focus. Hospital LOS data are count data, with discrete and nonnegative values, typically right-skewed, and often exhibiting excessive zeros. Numerous studies have been conducted to model hospital LOS to identify significant predictors contributing to its variability. Many researchers have used linear regression with or without logarithmic transformation of the outcome variable LOS, or logistic regression on a dichotomized LOS. These regression methods usually violate models’ assumptions and are subject to criticism for their inadequacy in modeling count data. Problems that may occur include biased parameter estimates, loss of precision of inferences, predicting meaningless negative values, and loss of important information about the underlying counts. Common statistical methods for the analysis of count data are Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) regressions. Many studies have been conducted comparing the performance of regression models for count data. However, the results from the analysis of empirical and/or simulated count data are in much disagreement. In this study, we compared the performance of Poisson, NB, ZIP, and ZINB regression models using simulated data under different scenarios with varying sample sizes, proportions of zeros, and levels of overdispersion. To illustrate the aforementioned regression methods, an analysis of hospital LOS was conducted using empirical data from the MIMIC-III database.


Copyright 2021 Gustavo A. Fernandez. All Rights Reserved.