Skip to main content

Datasets

Standard Dataset

A dataset for protein-ligand affinity prediction

Citation Author(s):
xi wenyu
Submitted by:
xi wenyu
Last updated:
DOI:
10.21227/a89m-av02
No Ratings Yet

Abstract

 **PDBBindv2016**  | Binding Affinity Regression | Benchmark Evaluation (Effectiveness) | Each sample in the PDBBind v2016 dataset is a complex, but we extracted the sequence data with substantial information loss to yield a protein-ligand sequence pair. We maintained the same split setting used in a previous study, where the refined set (excluding the core set) is treated as  training (train.csv) and validation (valid.csv) sets, while the core set (complexes with the highest resolution) is treated as the test set (test.csv). Other than 'Protein', 'Ligand', and 'regression_label', the CSV files have a column 'ID' that represents the PDB ID ('id_' + PDB ID), and a column 'Target_Chain' to represent the chain to which the amino acid position belongs.

Instructions:

 **PDBBindv2016**  | Binding Affinity Regression | Benchmark Evaluation (Effectiveness) | Each sample in the PDBBind v2016 dataset is a complex, but we extracted the sequence data with substantial information loss to yield a protein-ligand sequence pair. We maintained the same split setting used in a previous study, where the refined set (excluding the core set) is treated as  training (train.csv) and validation (valid.csv) sets, while the core set (complexes with the highest resolution) is treated as the test set (test.csv). Other than 'Protein', 'Ligand', and 'regression_label', the CSV files have a column 'ID' that represents the PDB ID ('id_' + PDB ID), and a column 'Target_Chain' to represent the chain to which the amino acid position belongs.

Dataset Files

Files have not been uploaded for this dataset