A dataset for protein-ligand affinity prediction

Citation Author(s):
xi
wenyu
Submitted by:
xi wenyu
Last updated:
Thu, 02/20/2025 - 03:13
DOI:
10.21227/a89m-av02
License:
0
0 ratings - Please login to submit your rating.

Abstract 

 **PDBBindv2016**  | Binding Affinity Regression | Benchmark Evaluation (Effectiveness) | Each sample in the PDBBind v2016 dataset is a complex, but we extracted the sequence data with substantial information loss to yield a protein-ligand sequence pair. We maintained the same split setting used in a previous study, where the refined set (excluding the core set) is treated as  training (train.csv) and validation (valid.csv) sets, while the core set (complexes with the highest resolution) is treated as the test set (test.csv). Other than 'Protein', 'Ligand', and 'regression_label', the CSV files have a column 'ID' that represents the PDB ID ('id_' + PDB ID), and a column 'Target_Chain' to represent the chain to which the amino acid position belongs.

Instructions: 

 **PDBBindv2016**  | Binding Affinity Regression | Benchmark Evaluation (Effectiveness) | Each sample in the PDBBind v2016 dataset is a complex, but we extracted the sequence data with substantial information loss to yield a protein-ligand sequence pair. We maintained the same split setting used in a previous study, where the refined set (excluding the core set) is treated as  training (train.csv) and validation (valid.csv) sets, while the core set (complexes with the highest resolution) is treated as the test set (test.csv). Other than 'Protein', 'Ligand', and 'regression_label', the CSV files have a column 'ID' that represents the PDB ID ('id_' + PDB ID), and a column 'Target_Chain' to represent the chain to which the amino acid position belongs.

Dataset Files

    Files have not been uploaded for this dataset