Datasets
Open Access
Protein Tertiary Structures Zaman_Molecules20
- Citation Author(s):
- Submitted by:
- Ahmed Bin Zaman
- Last updated:
- Mon, 04/20/2020 - 09:40
- DOI:
- 10.21227/gq2v-8k24
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Controlling the quality of tertiary structures computed for a protein molecule remains a central challenge in de-novo protein structure prediction. The rule of thumb is to generate as many structures as can be afforded, effectively acknowledging that having more structures increases the likelihood that some will reside near the sought biologically-active structure. A major drawback with this approach is that computing a large number of structures imposes time and space costs. This dataset is associated with our paper, "Reducing Ensembles of Protein Tertiary Structures Generated De Novo via Clustering", where we propose a novel clustering-based approach which we demonstrate to significantly reduce an ensemble of generated structures without sacrificing quality. This dataset provides the biologically-active structures of the protein targets used for evaluation, necessary data (sequence, fragment files) for generating structure ensembles, generated ensembles, reduced ensembles, and data necessary for generating and plotting the results in the paper. The paper is under review and we will update the link to the paper once it is published. The codes associated with this dataset can be found in, https://github.com/psp-codes/reduced-decoy-ensemble
Dataset Files
- Data.zip (597.03 MB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.