Datasets
Standard Dataset
a Hair Follicle Growth Association Gene Dataset
- Citation Author(s):
- Submitted by:
- Tao Zhang
- Last updated:
- Thu, 12/14/2023 - 02:08
- DOI:
- 10.21227/3ajx-4z98
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Accurate knowledge of key genes that promote hair follicle growth and development is of great value in the field of hair research and dermatology. Compared with the traditional time-consuming and laborious experimental methods for obtaining key genes, the literature mining method can extract proven key genes for hair follicle growth from the vast amount of literature more quickly and comprehensively, i.e., perform the tasks of Named Entity Recognition (NER) and Relationship Extraction (RE) of related entities. However, this method has been less researched in the field related to hair follicle growth and lacks standardized annotated datasets for training models and targeted gene extraction models. To address the above issues, this thesis creates the first labeled corpus containing 500 literature abstracts, and proposes a NER model based on fusion of contextual features (HFNER-FCF) and a hair follicle-gene RE model based on modified fine-tuning mechanism (HFGRE-MFM), respectively, in order to solve the entity similarity problem in hair follicle NER and to achieve a comprehensive learning of relational features in RE. thereby effectively extracting hair follicle growth-associated genes. Our model obtained 88.71% and 89.95% F-scores in two major tasks, respectively.In addition, applying the model to unlabeled literature, the first hair follicle growth-associated gene dataset (HFGAG) generated based on automated literature mining was obtained. This dataset will provide a valuable resource for the study of hair follicle development, regulation and treatment of related diseases. More importantly, the dataset construction method in this paper is universal and can provide reference for other biomedical research fields.
Unpublished article