Theses and Dissertations

Date of Award

7-1-2025

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Applied Statistics and Data Science

First Advisor

Marzieh Ayati

Second Advisor

Hansapani Rodrigo

Third Advisor

Li Zhang

Abstract

Phosphorylation and dephosphorylation are dynamic processes that control many aspects of cellular activity, such as metabolic pathways, cell cycle progression, and signal transduction. Protein activity and interactions are modulated by the reversible addition or removal of phosphate groups, which allows cells to react abruptly to evolving conditions. Although kinase-specific phosphorylation site prediction has advanced, phosphatase-specific dephosphorylation site computational prediction is still a major obstacle that prevents us from fully comprehending the extent of cellular regulation. In this study, we constructed a knowledge graph for the prediction of enzymes (kinases and phosphatases) and their associated substrates with specific phosphosites. As part of the methodology, we first preprocessed the data from different sources to construct the knowledge graph. Next, we applied the Node2Vec embedding method to find the vector representation of each node in the graph and used the embeddings to train the model for link prediction between the phosphatases and their substrates, kinases and their substrates, and enzymes and their substrates. We used five different machine learning models- Logistic Regression, Random Forest, Support Vector Machine, K Nearest Neighbor, and Naïve Bayes for prediction among which Random Forest outperformed the other models achieving the maximum accuracy for all three predictions utilizing our knowledge graph.

Comments

Copyright 2025 Jannatul Ferdaus. All Rights Reserved. https://proquest.com/docview/3274800276

Share

COinS