NCBI; BC5CDR; i2b2 2010; HPRD50; AIMed; MedNLI

Citation Author(s):
Rezarta Islamaj
Dogan
Jiao
Li
Uzuner
Özlem
Katrin
Fundel
Razvan C.
Bunescu
Alexey
Romanov
Soumya
Sanyal
Submitted by:
chen peng
Last updated:
Tue, 04/02/2024 - 01:16
DOI:
10.21227/ardx-5f55
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

NCBI: The NCBI dataset is a biomedical corpus containing 793 PubMed abstracts, each manually annotated to include disease mentions and their corresponding concepts, providing a high-quality gold standard for disease name recognition and normalization research.

BC5CDR-disease: BioCreative V Chemical-Disease Relation (BC5CDR) is annotated for biomedical named entity recognition and relation extraction task, consisting of 1500 PubMed articles, covering annotations of disease and chemical entities, as well as their interactions. In this paper, we only consider the disease entity of the named entity recognition task.

i2b2 2010: The i2b2 2010 dataset was sourced from three distinct medical institutions and was annotated by medical professionals to identify eight types of relations between medical problems and corresponding treatments, i.e., TrIP, TrWP, TrCP, TrAP, TrNAP, PIP, TeRP, TeCP.

HPRD50: The HPRD50 dataset is sourced from the HPRD database and used for studying human proteinprotein interactions (PPI). HPRD50 corpus consists of 43 documents annotated by true and false protein-protein interaction (PPI) relation.

 

AIMed: The AImed dataset is developed to evaluate protein name recognition and protein-protein interaction (PPI) extraction. AIMed corpus consists of 225 documents annotated by true and false protein-protein interaction (PPI) relation.

MedNLI: The MedNLI is collected from MIMIC-III with a form of premise-hypothesis pairs. And annotated by radiologists, the dataset is graded for entailment, contradiction, or neutrality based on whether the premise entails the hypothesis.

Instructions: 

Just specify the file path, and run the run.sh script file to start the program. The code has been uploaded to Github.

Comments

Dataset

Submitted by chen peng on Tue, 04/02/2024 - 01:18

data

Submitted by Bulut Ozler on Thu, 06/20/2024 - 20:19

data

Submitted by Kejun Zou on Mon, 11/04/2024 - 11:58

Dataset Files

    Files have not been uploaded for this dataset