Datasets
Standard Dataset
Haodf Doctor Recommendation Dataset
- Citation Author(s):
- Submitted by:
- Jiazheng Jing
- Last updated:
- Wed, 01/22/2025 - 00:17
- DOI:
- 10.21227/tbax-nf45
- License:
- Categories:
- Keywords:
Abstract
We collected patient-doctor interaction data from the Haodf online consultation platform on the six common diseases, categorized by different risk levels. Low-risk diseases include Common Cold (Cold) and Pneumonia (Pneu.), medium-risk diseases include Diabetes (Diab.) and Depression (Depr.), and high-risk diseases include Coronary Heart Disease (CHD) and Lung Cancer (Lung.). We only use publicly accessible data, with all patients and doctors remaining anonymous, ensuring effective protection of their privacy. To further evaluate the effectiveness of identifying the most relevant doctors for treating a patient’s symptoms, we also collected disease tags t for each patient suffering from x. These tags offer a more detailed description of the patient’s condition, allowing for more precise treatment matching. For example, For instance, the tag Viral Pneumonia provides a more specific categorization under the broader category of Pneumonia. Similarly, Malignant Tumor is a detailed tag used for patients diagnosed with Lung Cancer. It is important to note that the disease tag is used solely for evaluation purposes and is not involved in any of the training processes. Detailed statistics of the dataset are provided in Table I. For the dataset split, we divided the records of each doctor’s consultation cases into training, validation, and test sets in a ratio of 8:1:1.
Each disease contains two files, take Diabetes (diab) as an example, the two files are: diab_doctor.csv and diab_inter.csv which describe the doctor metadata and interaction information, respectively.
Comments
Dataset for our TKDE paper