
Posters
Presenting Author Academic/Professional Position
Graduate Student
Academic Level (Author 1)
Graduate Student
Academic Level (Author 2)
Faculty
Academic Level (Author 3)
Faculty
Presentation Type
Poster
Discipline Track
Other
Bioinformatics
Abstract Type
Research/Clinical
Abstract
Background: Phosphatase-substrate interactions may serve as biomarkers for diseases, offering crucial insights to support early diagnosis. Analyzing these substrate-specific interactions is also crucial in recommending different drugs. In our research, we constructed a knowledge graph that provides a comprehensive network representation, capturing diverse relationships among biological entities relevant to phosphatases.
Methods: Our knowledge graph represents 5 unique nodes- kinase, phosphatase, substrate, drug, and disease. The connections between the nodes are obtained from DrugMap (drug and their target proteins; disease and their associated proteins), STRING protein-protein interaction, DEPOD 2019 (phosphatase and their substrate), PhosphoSitePLUS (kinases and their substrate), and UniProt. As part of the methodology, we first preprocessed the data from different sources and applied the Node2Vec embedding method to find the vector representation of each node and used embeddings to train the model for link prediction between the phosphatases and their substrates. We used five different machine learning models- Logistic Regression, Random Forest, Support Vector Machine, K Nearest Neighbor, and Naïve Bayes for prediction.
Results: Our findings show that Random Forest proves to be the best model suited for our knowledge graph providing accuracy, f1-score, and AUC of 95.23%, 95.18%, and 95.23% respectively. On the other hand, the Naïve Bayes model proves to be the worst model compared to the other models. Moreover, choosing the dimension value as 16 for vector representation proves to be the best fit for our knowledge graph.
Conclusion: By integrating a variety of data sources, we captured the functional relationship and association between phosphatases and their substrates. Overall, our research on Phosphatase-substrate interactions and their link prediction will help in the early diagnosis of different diseases.
Recommended Citation
Ferdaus, Jannatul; Ayati, Marzieh; and Rodrigo, Hansapani, "Phosphatase-Substrate Prediction using Heterogeneous Knowledge Graph" (2025). Research Symposium. 188.
https://scholarworks.utrgv.edu/somrs/2025/posters/188
Included in
Phosphatase-Substrate Prediction using Heterogeneous Knowledge Graph
Background: Phosphatase-substrate interactions may serve as biomarkers for diseases, offering crucial insights to support early diagnosis. Analyzing these substrate-specific interactions is also crucial in recommending different drugs. In our research, we constructed a knowledge graph that provides a comprehensive network representation, capturing diverse relationships among biological entities relevant to phosphatases.
Methods: Our knowledge graph represents 5 unique nodes- kinase, phosphatase, substrate, drug, and disease. The connections between the nodes are obtained from DrugMap (drug and their target proteins; disease and their associated proteins), STRING protein-protein interaction, DEPOD 2019 (phosphatase and their substrate), PhosphoSitePLUS (kinases and their substrate), and UniProt. As part of the methodology, we first preprocessed the data from different sources and applied the Node2Vec embedding method to find the vector representation of each node and used embeddings to train the model for link prediction between the phosphatases and their substrates. We used five different machine learning models- Logistic Regression, Random Forest, Support Vector Machine, K Nearest Neighbor, and Naïve Bayes for prediction.
Results: Our findings show that Random Forest proves to be the best model suited for our knowledge graph providing accuracy, f1-score, and AUC of 95.23%, 95.18%, and 95.23% respectively. On the other hand, the Naïve Bayes model proves to be the worst model compared to the other models. Moreover, choosing the dimension value as 16 for vector representation proves to be the best fit for our knowledge graph.
Conclusion: By integrating a variety of data sources, we captured the functional relationship and association between phosphatases and their substrates. Overall, our research on Phosphatase-substrate interactions and their link prediction will help in the early diagnosis of different diseases.