Theses and Dissertations

Date of Award


Document Type


Degree Name

Master of Science (MS)


Applied Statistics and Data Science

First Advisor

Dr. Hansapani Rodrigo

Second Advisor

Dr. George Yanev

Third Advisor

Dr. Tamer Oraby


In 2020, COVID-19 became the first pandemic in the world’s history that brought the entire world to an abrupt and unexpected halt. Since the first reported case of the disease to date, the novel coronavirus has been able to wreak havoc in literary every corner of the globe and left an ever-growing number of unprecedented fatalities. The normal way of life has been disrupted, and the level of uncertainty about the end of this pandemic continues to manifest to many. Due to the urgency to bring this pandemic under control, medical officers have been able to recommend actions that people need to undertake voluntarily to assist in slowing down the spread of the disease. This study has a particular focus on COVID-19 testing as an essential measure being used in monitoring and controlling the spread of the virus. The study investigates some of the essential factors that can predict whether a person has higher odds of taking a coronavirus test or not. As it is evident, the fight against the spread of coronavirus is a collective responsibility that requires socially responsible behavior from people. This study used a portion of the data collected by the Understanding American Study (UAS) in their national longitudinal survey of the attitudes and behaviors around COVID-19 in the USA. The participants of this survey were randomly sampled using the Address- Based Sampling (ABS) from postal records drawn from across the country. The targeted sample size was 8,900 participants, but only 6,067 of them could complete the questionnaire, and this study utilized 440 of the completed responses that had no missing values. Both descriptive and inferential statistics were computed. For the descriptive analysis, frequencies were obtained as the majority of the variables were categorical. The bivariate analysis was performed using Chi-square and Wilcoxon Sign tests. Further analysis was performed using machine learning models including classification and regression decision trees, gradient boosting(CART), random forest, and artificial neural networks, followed by logistic regression models. The findings showed significant higher odds ratios for, persons with black ethnicity (OR: 3.26, 95% CI:1.36,7.86), and obtaining news about COVID-19 from physicians (OR:4.17, 95% CI: 2.13, 8.30) towards taking a coronavirus test. Every one unit increase in the Age has shown 3% lower odds (OR:0.97, 95% CI: 0.95, 0.99) while obtaining coronavirus news from social media (OR:0.41, 95% CI:0.18, 0.88), never feeling unable to control life during the pandemic (OR:0.28, 95%CI:0.12, 0.63) and sometimes feeling unable to control life during the pandemic (OR:0.30, 95%CI:0.10, 0.87) also had shown lower odds of an individual taking part in COVID-19 testing. Random Forest (RF) model had yielded the optimal average area under the receiver operating characteristic curve value (AUC) of 78.15(SD:1.05) with an accuracy of 76.34%(SD:2.13) followed by the (CART) Decision Tree Model with an average AUC of 60.91%(SD:4.51) and an accuracy of 72.39(SD:2.89). The SHAP analysis based on the optimal RF model reveals that use of social media to obtain coronavirus information, feeling of things not going your way during pandemic, constant worrying, age and feeling unable to control life situation during the pandemic were found to be the most influencing factors. The study recommends that the health care authorities consider these factors when conducting their awareness programs on the importance of COVID-19 testing and pandemic in the future.


Copyright 2021 Sheila Rutto. All Rights Reserved.