Datasets
Open Access
Dataset for Generative Adversarial Learning of Protein Tertiary Structures. Molecules, 2021.
- Citation Author(s):
- Submitted by:
- TASEEF RAHMAN
- Last updated:
- Tue, 02/02/2021 - 02:58
- DOI:
- 10.21227/m8sa-cz14
- License:
- Categories:
Abstract
Protein molecules are inherently dynamic and modulate their interactions with different molecular partners by accessing different tertiary structures under physiological conditions.Elucidating such structures remains challenging. Current momentum in deep learning and the powerful performance of generative adversarial networks (GANs) in complex domains, such as computer vision, inspires us to investigate GANs on their ability to generate physically-realistic protein tertiary structures. The analysis presented here shows that several GAN models fail to capture complex, distal structural patterns present in protein tertiary structures. The study additionally reveals that mechanisms touted as effective in stabilizing the training of a GAN model are not all effective, and that performance based on loss alone may be orthogonal to performance based on the quality of generated datasets. A novel contribution in this study is the demonstration that Wasserstein GAN strikes a good balance and manages to capture both local and distal patterns, thus presenting a first step towards more powerful deep generative models for exploring a possibly very diverse set of structures supporting diverse activities of a protein molecule in the cell
Generated data, input data and saved models for the publication
Taseef Rahman, Yuanqi Du, Liang Zhao, and Amarda Shehu. Generative Adversarial Learning of Protein Tertiary Structures. Molecules, 2021.
is made available. Instructions accompany the data in a ReadMe.txt in each folder respectively for the ease of use.
Dataset Files
- RahmanDuZhaoShehu_Molecules_DLSI20.zip (18.53 GB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.