Proteins that can be secreted into bronchoalveolar lavage fluid

Citation Author(s):
Dan
Shao
Submitted by:
Guangzhao Zhang
Last updated:
Thu, 10/10/2024 - 21:01
DOI:
10.21227/pypf-en48
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The positive dataset, derived from the HBFP database, comprised 3,434 proteins. The initial negative dataset was constructed by selecting proteins from Pfam families with no overlap with the positive dataset, totaling 8,029 proteins. This set was further refined using protein-protein interaction (PPI) networks across various databases, leading to an expanded collection of 13,912 proteins, which was later narrowed down to 6,740 after exclusions. Following a curation process to remove sequence redundancy, the datasets were finalized with 3,319 positive and 6,599 negative proteins. Given the scarcity of available structural data from the Protein Data Bank (PDB), AlphaFold v2.0 was utilized to predict high-quality 3D structures, thereby enriching the dataset with structural details for 9,702 proteins.

Instructions: 

This document describes the data used in SecProGNN.

Comments

for test purpsoe

Submitted by Ekrim Ali on Thu, 10/10/2024 - 05:21

Documentation

AttachmentSize
File File Format.txt735 bytes