PICO-DS

Citation Author(s):
Xiang
Zhang
Southeast University
Jiaxin
Hu
Southeast University
Qian
Lu
Southeast University
Submitted by:
Xiang Zhang
Last updated:
Mon, 06/19/2023 - 03:27
DOI:
10.21227/1ykk-5s61
Data Format:
License:
160 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

Automatic extraction of valuable, structured evidence from the exponentially growing clinical trial literature can help physicians practice evidence-based medicine quickly and accurately. However, current research on evidence extraction has been limited by the lack of generalization ability on various clinical topics and the high cost of manual annotation. In this work, we address these challenges by constructing a PICO-based evidence dataset PICO-DS, covering five clinical topics. This dataset was automatically labeled by a distant supervision based on our proposed textual similarity algorithm called ROUGE-Hybrid. PICO-DS is a distant supervision dataset that includes 24,909 samples across 5 medical topics. Each sample has its corresponding PICO label. We according to the PICO framework defines four types of tags: P on behalf of the Patient/Population/Problem, I on behalf of Intervention/Comparision, O on behalf of the Outcome, N for NA, which does not belong to the above three kinds of classification.

Instructions: 

The PICO-DS dataset contains three folders: meta,test, and train. Each folder contains a collection of samples in csv format for 4 categories (P,I,O,N). Samples in meta and test are generated by manual annotation, while samples in train are generated by remote supervision method.