Datasets
Standard Dataset
The dataset of GIF-PLA
- Citation Author(s):
- Submitted by:
- Zhu Yan
- Last updated:
- Tue, 12/12/2023 - 03:38
- DOI:
- 10.21227/ggd7-a357
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Accurate prediction of protein-ligand binding affinities (PLAs) is essential for drug discovery, repositioning, and design. Deep learning (DL) techniques have shown promise in PLA prediction, due to their high expressive power, leading to significant advancements. However, the relationships among ligand and protein entities are not fully characterized, resulting in relatively simplistic representations that may affect the accuracy and generalization. In this study, we propose GIF-PLA (Graph-based heterogeneous Information Fusion framework for Protein-Ligand binding Affinity prediction), a structure-aware approach that effectively estimates the binding affinity scores of protein-ligand pairs with the assistance of protein-ligand binding complexes. Protein-ligand binding complexes are transformed into heterogeneity graphs with meta-paths, in parallel with protein sequences and ligand SMILES strings, fed into cascaded deep neural networks, respectively. GIF-PLA has the ability to acquire structure-oriented information, encompassing topological interactions and high-order nonlinear relationships, as well as sequence-oriented 1D grammatical information. Finally, a fusion module to effectively integrate multi-level information, and a visualization technique to demonstrate the biological significance of the predictions. The resulting GIF-PLA model substantially outperforms state-of-the-art methods on well-known datasets.
csv files