Q&A Text in Chinese Online Medical Community

Citation Author(s):: Yushan
Deng

Man
Li
Submitted by:: Man Li
Last updated:: Mon, 07/08/2024 - 15:59
DOI:: 10.21227/sr2c-k812
Data Format:: *.csv
License:: Creative Commons Attribution

30 Views

Categories:: Machine Learning
Keywords:: artificial intelligence; machine learning; natural language processing; sentiment analysis; named entity recognition; relation extraction

0 ratings - Please login to submit your rating.

ACCESS DATASET CITE

Abstract

Using Python. we crawl a total of 18, 793 diabetes related Q&A between Jun. 1, 2016 and Sept. 1, 2020 on xywy.com, a famous Chinese Online Medical Community. Each data contains four parts of the question detail page: Title, Problem Description, User ID and Question Time, and three parts of the doctor’s answer page: Doctor ID, Answer Content and Answer Time. After preprocessing such as cleaning and deduplication, we finally obtain 18,521 valid data. Considering the Problem Description contains the background information of the doctor’s answer, we combine the two into the Answer Content, which is used as the text of the knowledge graph construction later.

Instructions:

Using Python. we crawl a total of 18, 793 diabetes related Q&A between Jun. 1, 2016 and Sept. 1, 2020 on xywy.com, a famous Chinese Online Medical Community. Each data contains four parts of the question detail page: Title, Problem Description, User ID and Question Time, and three parts of the doctor’s answer page: Doctor ID, Answer Content and Answer Time. After preprocessing such as cleaning and deduplication, we finally obtain 18,521 valid data. Considering the Problem Description contains the background information of the doctor’s answer, we combine the two into the Answer Content, which is used as the text of the knowledge graph construction later.

Dataset Files

ori_data_final.csv (20.81 MB)

LOGIN TO ACCESS DATASET FILES

Datasets

Standard Dataset

Q&A Text in Chinese Online Medical Community

Abstract

Dataset Files

QUESTIONS?