Dataset for Generative Adversarial Learning of Protein Tertiary Structures. Molecules, 2021.

Citation Author(s):
TASEEF
RAHMAN
Department of Computer Science, George Mason University
YUANQI
DU
Department of Computer Science, George Mason University,
LIANG
ZHAO
Department of Computer Science, George Mason University
AMARDA
SHEHU
Department of Computer Science, George Mason University,
Submitted by:
TASEEF RAHMAN
Last updated:
Tue, 02/02/2021 - 02:58
DOI:
10.21227/m8sa-cz14
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Protein molecules are inherently dynamic and modulate their interactions with different molecular partners by accessing different tertiary structures under physiological conditions.Elucidating such structures remains challenging. Current momentum in deep learning and the powerful performance of generative adversarial networks (GANs) in complex domains, such as computer vision, inspires us to investigate GANs on their ability to generate physically-realistic protein tertiary structures. The analysis presented here shows that several GAN models fail to capture complex, distal structural patterns present in protein tertiary structures. The study additionally reveals that mechanisms touted as effective in stabilizing the training of a GAN model are not all effective, and that performance based on loss alone may be orthogonal to performance based on the quality of generated datasets. A novel contribution in this study is the demonstration that Wasserstein GAN strikes a good balance and manages to capture both local and distal patterns, thus presenting a first step towards more powerful deep generative models for exploring a possibly very diverse set of structures supporting diverse activities of a protein molecule in the cell

Instructions: 

Generated data, input data and saved models for the publication
Taseef Rahman, Yuanqi Du, Liang Zhao, and Amarda Shehu. Generative Adversarial Learning of Protein Tertiary Structures. Molecules, 2021.
is made available. Instructions accompany the data in a ReadMe.txt in each folder respectively for the ease of use.