Theses and Dissertations

Date of Award


Document Type


Degree Name

Master of Science (MS)


Applied Statistics and Data Science

First Advisor

Dr. Kristina Vatcheva

Second Advisor

Dr. Mrinal Roychowdhury

Third Advisor

Dr. Santanu Chakraborty


Obesity is the accumulation of an abnormal, or excessive, amount of fat in the body, which can have negative effects on overall health. This excess accumulation of macronutrients in adipose tissue can cause the release of inflammatory mediators, leading to a proinflammatory state. Inflammation is a known risk factor for various health conditions, including cardiovascular diseases, metabolic syndrome, and diabetes. This study sought to examine the use of data mining methods, particularly clustering algorithms, to identify inflammatory biomarker phenotypes and their association with obesity in a local adolescent population. The algorithms evaluated in this study included: k-means, Ward's hierarchical agglomerative method, fuzzy c-means, Gaussian mixture model, and principal component analysis (PCA). The algorithms were assessed using different validation indices, graphs, as well as clinical interpretation of the resulting clusters. The results showed that k-Means, k = 3, produced the most accurate clusters. Based on their characterization, the clusters were defined as: severe risk for metabolic dysfunction, moderate risk for metabolic dysfunction, and normal metabolic function. Adolescents with a higher BMI and waist circumference had higher odds of being classified in the severe metabolic risk cluster. Although PCA is a different type of clustering algorithm, it supported the resultant cluster by grouping their dominant inflammatory biomarkers characteristics into separate principal components. These findings suggested a strong relationship between CRP and Leptin inflammatory biomarkers and higher BMI and waist circumference in the local adolescent study population.


Copyright 2023 Tania Mayleth Vargas. All Rights Reserved.