Q&A Text in Chinese Online Medical Community

Name: Q&A Text in Chinese Online Medical Community
Creator: Man Li
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Machine Learning

Citation Author(s):: Yushan Deng

Man Li
Submitted by:: Man Li
Last updated:: Mon, 07/08/2024 - 19:59
DOI:: 10.21227/sr2c-k812
Data Format:: *.csv

32 views

Categories:

Machine Learning

Keywords:

artificial intelligence; machine learning; natural language processing; sentiment analysis; named entity recognition; relation extraction

ACCESS DATASET CITE

Abstract

Using Python. we crawl a total of 18, 793 diabetes related Q&A between Jun. 1, 2016 and Sept. 1, 2020 on xywy.com, a famous Chinese Online Medical Community. Each data contains four parts of the question detail page: Title, Problem Description, User ID and Question Time, and three parts of the doctor’s answer page: Doctor ID, Answer Content and Answer Time. After preprocessing such as cleaning and deduplication, we finally obtain 18,521 valid data. Considering the Problem Description contains the background information of the doctor’s answer, we combine the two into the Answer Content, which is used as the text of the knowledge graph construction later.