Protein Tertiary Structures Zaman_Molecules20

Citation Author(s):
Ahmed Bin
Zaman
George Mason University
Parastoo
Kamranfar
George Mason University
Carlotta
Domeniconi
George Mason University
Amarda
Shehu
George Mason University
Submitted by:
Ahmed Bin Zaman
Last updated:
Mon, 04/20/2020 - 09:40
DOI:
10.21227/gq2v-8k24
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

 

 

Controlling the quality of tertiary structures computed for a protein molecule remains a central challenge in de-novo protein structure prediction. The rule of thumb is to generate as many structures as can be afforded, effectively acknowledging that having more structures increases the likelihood that some will reside near the sought biologically-active structure. A major drawback with this approach is that computing a large number of structures imposes time and space costs. This dataset is associated with our paper, "Reducing Ensembles of Protein Tertiary Structures Generated De Novo via Clustering", where we propose a novel clustering-based approach which we demonstrate to significantly reduce an ensemble of generated structures without sacrificing quality. This dataset provides the biologically-active structures of the protein targets used for evaluation, necessary data (sequence, fragment files) for generating structure ensembles, generated ensembles, reduced ensembles, and data necessary for generating and plotting the results in the paper. The paper is under review and we will update the link to the paper once it is published. The codes associated with this dataset can be found in, https://github.com/psp-codes/reduced-decoy-ensemble

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.