Computer Science Faculty Publications

Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks

Islam A. Ebeid, Texas Woman's University
Haoteng Tang, The University of Texas Rio Grande Valley
Pengfei Gu, The University of Texas Rio Grande Valley

Document Type

Article

Publication Date

10-21-2025

Abstract

Introduction: Accurate prediction of protein-protein interactions (PPIs) is crucial for understanding cellular functions and advancing the development of drugs. While existing in-silico methods leverage direct sequence embeddings from Protein Language Models (PLMs) or apply Graph Neural Networks (GNNs) to 3D protein structures, the main focus of this study is to investigate less computationally intensive alternatives. This work introduces a novel framework for the downstream task of PPI prediction via link prediction.

Methods: We introduce a two-stage graph representation learning framework, ProtGram-DirectGCN. First, we developed ProtGram, a novel approach that models a protein's primary structure as a hierarchy of globally inferred n-gram graphs. In these graphs, residue transition probabilities, aggregated from a large sequence corpus, define the edge weights of a directed graph of paired residues. Second, we propose a custom directed graph convolutional neural network, DirectGCN, which features a unique convolutional layer that processes information through separate path-specific (incoming, outgoing, undirected) and shared transformations, combined via a learnable gating mechanism. DirectGCN is applied to the ProtGram graphs to learn residue-level embeddings, which are then pooled via an attention mechanism to generate protein-level embeddings for the prediction task.

Results: The efficacy of the DirectGCN model was first established on standard node classification benchmarks, where its performance is comparable to that of established methods on general datasets, while demonstrating specialization for complex, directed, and dense heterophilic graph structures. When applied to PPI prediction, the full ProtGram-DirectGCN framework achieves robust predictive power despite being trained on limited data.

Discussion: Our results suggest that a globally inferred, directed graph-based representation of sequence transitions offers a potent and computationally distinct alternative to resource-intensive PLMs for the task of PPI prediction. Future work will involve testing ProtGram-DirectGCN on a wider range of bioinformatics tasks.

Comments

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Recommended Citation

Ebeid, I. A., Tang, H., & Gu, P. (2025). Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks. Frontiers in bioinformatics, 5, 1651623. https://doi.org/10.3389/fbinf.2025.1651623

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Publication Title

Frontiers in Bioinformatics

DOI

10.3389/fbinf.2025.1651623

Download

Included in

Biomedical Informatics Commons, Computer Sciences Commons

COinS

Computer Science Faculty Publications

Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks

Document Type

Publication Date

Abstract

Comments

Recommended Citation

Creative Commons License

Publication Title

DOI

Included in

Browse

Search

Author Corner

Links

Computer Science Faculty Publications

Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks

Authors

Document Type

Publication Date

Abstract

Comments

Recommended Citation

Creative Commons License

Publication Title

DOI

Included in

Share

Browse

Search

Author Corner

Links