Datasets
Standard Dataset
Q&A Text in Chinese Online Medical Community
- Citation Author(s):
- Submitted by:
- Man Li
- Last updated:
- Mon, 07/08/2024 - 15:59
- DOI:
- 10.21227/sr2c-k812
- Data Format:
- License:
Abstract
Using Python. we crawl a total of 18, 793 diabetes related Q&A between Jun. 1, 2016 and Sept. 1, 2020 on xywy.com, a famous Chinese Online Medical Community. Each data contains four parts of the question detail page: Title, Problem Description, User ID and Question Time, and three parts of the doctor’s answer page: Doctor ID, Answer Content and Answer Time. After preprocessing such as cleaning and deduplication, we finally obtain 18,521 valid data. Considering the Problem Description contains the background information of the doctor’s answer, we combine the two into the Answer Content, which is used as the text of the knowledge graph construction later.
Using Python. we crawl a total of 18, 793 diabetes related Q&A between Jun. 1, 2016 and Sept. 1, 2020 on xywy.com, a famous Chinese Online Medical Community. Each data contains four parts of the question detail page: Title, Problem Description, User ID and Question Time, and three parts of the doctor’s answer page: Doctor ID, Answer Content and Answer Time. After preprocessing such as cleaning and deduplication, we finally obtain 18,521 valid data. Considering the Problem Description contains the background information of the doctor’s answer, we combine the two into the Answer Content, which is used as the text of the knowledge graph construction later.