A dataset for protein-ligand affinity prediction

Citation Author(s):: xi wenyu
Submitted by:: xi wenyu
Last updated:: Thu, 02/20/2025 - 08:13
DOI:: 10.21227/a89m-av02

47 views

Categories:

Artificial Intelligence

Keywords:

protein-ligand affinity prediction

ACCESS DATASET CITE

Abstract

**PDBBindv2016** | Binding Affinity Regression | Benchmark Evaluation (Effectiveness) | Each sample in the PDBBind v2016 dataset is a complex, but we extracted the sequence data with substantial information loss to yield a protein-ligand sequence pair. We maintained the same split setting used in a previous study, where the refined set (excluding the core set) is treated as training (train.csv) and validation (valid.csv) sets, while the core set (complexes with the highest resolution) is treated as the test set (test.csv). Other than 'Protein', 'Ligand', and 'regression_label', the CSV files have a column 'ID' that represents the PDB ID ('id_' + PDB ID), and a column 'Target_Chain' to represent the chain to which the amino acid position belongs.