
Posters
Presenting Author Academic/Professional Position
High School Student
Academic Level (Author 1)
High School Student
Discipline/Specialty (Author 1)
Population Health and Biostatistics
Academic Level (Author 2)
Faculty
Discipline/Specialty (Author 2)
Population Health and Biostatistics
Presentation Type
Poster
Discipline Track
Biomedical ENGR/Technology/Computation
Abstract Type
Research/Clinical
Abstract
Background: Diabetic heart failure (DHF) is defined as a chronic and progressive disease which is associated with both diabetes and heart failure (HF). Even though there have been many developments in the knowledge of these diseases, there is still much to learn about the genetic crossovers between the two. In this study, we identified genes that are associated with diabetic heart failure and heart failure by using gene expression data from patients with DHF, HF, and a control group of patients who died of natural causes. We sought to identify genes that had altered expression levels which could possibly play a role in the disease pathways.
Methods: The data set used in the study was formatted as a matrix which included three groups: 5 control samples (individuals who died naturally), 7 samples from DHF and 12 samples from non-DHF. All the data analysis were done using CRAN package in R and RStudio. Volcano plots were used to determine the differentially expressed genes according to the log2 fold-change and ttest. We used machine learning models, including Naive Bayes, Random Forest, and Logistic Regression, that were trained using confusion matrices and 5-fold cross-validation to classify control and diseased groups. We also used functional pathway enrichment tools, such as EnrichR and StringPPI, for the identification of biological processes and genetic pathways associated with the identified genes. Also, to identify gene expression clustering and the relationships between the sample groups, data visualization methods such as UMAPs (Uniform Manifold Approximation and Projection) and heatmaps were employed.
Results: From the volcano plots, 149 genes were seen to be differentially expressed in the diseased groups. EnrichR analysis revealed that these genes were linked to the pathologies like lipid and atherosclerosis, insulin resistance and several signaling pathways. The UMAP analysis highlighted clear distinction between the control and the diseased groups with some degree of overlap between the DHF and HF groups. The accuracy of the predictive models was as follows: Naive Bayes = 100%, Random Forest = 80% and Logistic Regression = 60%. This is supported by the heat maps, which were similar to the UMAP results, where the genes that were similarly expressed were grouped into one cluster within each group. Further analysis of the regulated pathways showed that there were close interactions between the genes that were highly expressed, which may help to explain the mechanisms of DHF and HF.
Conclusion: Our study provided evidence of 149 genes with altered expression in diabetic heart failure and heart failure and their relationships with the important biological processes. The findings of this study also showed that the machine learning models have a high potential to differentiate between healthy and diseased conditions with the highest accuracy being displayed by the Naïve Bayes model. The results of the bioinformatics analysis, the pathway analyses and the gene mappings enabled the identification of genetic relationships and disease mechanisms and thus will form a basis for future studies.
Recommended Citation
Sahoo, Sunakhi and Ayati, Marzieh, "Unraveling Genetic Links Between Diabetes and Heart Failure-A Machine Learning Approach" (2025). Research Symposium. 161.
https://scholarworks.utrgv.edu/somrs/2025/posters/161
Included in
Artificial Intelligence and Robotics Commons, Biochemical Phenomena, Metabolism, and Nutrition Commons, Cardiovascular Diseases Commons, Databases and Information Systems Commons, Data Science Commons, Endocrine System Diseases Commons, Numerical Analysis and Scientific Computing Commons, Theory and Algorithms Commons
Unraveling Genetic Links Between Diabetes and Heart Failure-A Machine Learning Approach
Background: Diabetic heart failure (DHF) is defined as a chronic and progressive disease which is associated with both diabetes and heart failure (HF). Even though there have been many developments in the knowledge of these diseases, there is still much to learn about the genetic crossovers between the two. In this study, we identified genes that are associated with diabetic heart failure and heart failure by using gene expression data from patients with DHF, HF, and a control group of patients who died of natural causes. We sought to identify genes that had altered expression levels which could possibly play a role in the disease pathways.
Methods: The data set used in the study was formatted as a matrix which included three groups: 5 control samples (individuals who died naturally), 7 samples from DHF and 12 samples from non-DHF. All the data analysis were done using CRAN package in R and RStudio. Volcano plots were used to determine the differentially expressed genes according to the log2 fold-change and ttest. We used machine learning models, including Naive Bayes, Random Forest, and Logistic Regression, that were trained using confusion matrices and 5-fold cross-validation to classify control and diseased groups. We also used functional pathway enrichment tools, such as EnrichR and StringPPI, for the identification of biological processes and genetic pathways associated with the identified genes. Also, to identify gene expression clustering and the relationships between the sample groups, data visualization methods such as UMAPs (Uniform Manifold Approximation and Projection) and heatmaps were employed.
Results: From the volcano plots, 149 genes were seen to be differentially expressed in the diseased groups. EnrichR analysis revealed that these genes were linked to the pathologies like lipid and atherosclerosis, insulin resistance and several signaling pathways. The UMAP analysis highlighted clear distinction between the control and the diseased groups with some degree of overlap between the DHF and HF groups. The accuracy of the predictive models was as follows: Naive Bayes = 100%, Random Forest = 80% and Logistic Regression = 60%. This is supported by the heat maps, which were similar to the UMAP results, where the genes that were similarly expressed were grouped into one cluster within each group. Further analysis of the regulated pathways showed that there were close interactions between the genes that were highly expressed, which may help to explain the mechanisms of DHF and HF.
Conclusion: Our study provided evidence of 149 genes with altered expression in diabetic heart failure and heart failure and their relationships with the important biological processes. The findings of this study also showed that the machine learning models have a high potential to differentiate between healthy and diseased conditions with the highest accuracy being displayed by the Naïve Bayes model. The results of the bioinformatics analysis, the pathway analyses and the gene mappings enabled the identification of genetic relationships and disease mechanisms and thus will form a basis for future studies.