FSLP datasets for DR

Citation Author(s):
Jincai
Huang
Submitted by:
Jincai Huang
Last updated:
Mon, 07/08/2024 - 15:58
DOI:
10.21227/9g0b-er61
License:
11 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

BIOKG is a medical Knowledge Graph (KG) constructed using data from numerous biomedical data repositories. It encompasses various types of entities, including diseases, proteins, drugs, side effects, and protein functions. The KG consists of 51 types of directed relations that connect different pairs of entity types. These relations encompass diverse aspects such as drug-drug interactions (39 types), protein-protein interactions (8 types), as well as drug-protein, drug-side effect, and drug-protein function relations. This dataset serves as a valuable resource for both basic and biomedical machine learning research. From a biological perspective, it enables researchers to delve deeper into the understanding of human biology and make predictions that can guide future biomedical investigations. 

 COVID19-One: To demonstrate the ability of our model to run on small-scale KGs, we follow a similar process to build another smaller dataset based on a public COVID-19 KG dataset (http://openkg.cn/dataset/covid-19-research). The knowledge graph contains 7 node types and 9 semantic relations. Among 9 kinds of relations, two of them are derived from genome-level knowledge databases.

Instructions: 

Xiong et al. have constructed datasets for one-shot learning, such as NELL-One and Wiki-One. However, there is no dataset available for testing in the medical domain. To address this gap, we went back to the original medical Knowledge Graphs (KGs) and randomly selected a subset of triplet relations to create one-shot task relations. The remaining relations are referred to as background relations, as the triples provide important background knowledge for matching entity pairs.