Skip to main content

Datasets

Standard Dataset

Q&A Text in Chinese Online Medical Community

Citation Author(s):
Yushan Deng
Man Li
Submitted by:
Man Li
Last updated:
DOI:
10.21227/sr2c-k812
Data Format:
No Ratings Yet

Abstract

Using Python. we crawl a total of 18, 793 diabetes related Q&A between Jun. 1, 2016 and Sept. 1, 2020 on xywy.com, a famous Chinese Online Medical Community. Each data contains four parts of the question detail page: TitleProblem DescriptionUser ID and Question Time, and three parts of the doctor’s answer page: Doctor IDAnswer Content and Answer Time. After preprocessing such as cleaning and deduplication, we finally obtain 18,521 valid data. Considering the Problem Description contains the background information of the doctor’s answer, we combine the two into the Answer Content, which is used as the text of the knowledge graph construction later.

Instructions:

Using Python. we crawl a total of 18, 793 diabetes related Q&A between Jun. 1, 2016 and Sept. 1, 2020 on xywy.com, a famous Chinese Online Medical Community. Each data contains four parts of the question detail page: TitleProblem DescriptionUser ID and Question Time, and three parts of the doctor’s answer page: Doctor IDAnswer Content and Answer Time. After preprocessing such as cleaning and deduplication, we finally obtain 18,521 valid data. Considering the Problem Description contains the background information of the doctor’s answer, we combine the two into the Answer Content, which is used as the text of the knowledge graph construction later.