Datasets
Standard Dataset
term_name_generation
- Citation Author(s):
- Submitted by:
- Bingxuan Li
- Last updated:
- Sat, 02/11/2023 - 12:30
- DOI:
- 10.21227/b5y6-7147
- Data Format:
- License:
37 Views
- Categories:
0 ratings - Please login to submit your rating.
Abstract
We build a large-scale dataset for term name generation, which contains the GO terms about Homo sapiens (humankind and yeast). We collect the term ID, term name and the corresponding genes’ ID from \href{http://geneontology.org/}{Gene Ontology Consortium}. In addition, the gene alias and descriptions are crawled from \href{https://www.genecards.org/}{GeneCards}, which contains the information from the website itself, Entrez (Online Resource Retriever provided by the National Center for Biotechnology Information), \href{https://www.uniprot.org/}{UniProt} and \href{http://www.gpmaw.com/html/swiss-prot.html}{SWISS-PROT} (A database of annotated protein sequences maintained by the European Bioinformatics Institute (EBI)).
Instructions:
download data
Dataset Files
- idName_mix.json (2.57 MB)
- all_geneDe_mix.json (8.87 MB)
- shuffle_Onto2Gene_mix.json (2.01 MB)