Datasets
Standard Dataset
Model Performance Results For Distribution-Driven Augmentation of Medical Data
- Citation Author(s):
- Submitted by:
- Stephen Price
- Last updated:
- Mon, 07/08/2024 - 15:58
- DOI:
- 10.21227/685w-tx28
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
The data included here within is the associated model training results from the correlated paper "Distribution-Driven Augmentation of Real-World Datasets for Improved Cancer Diagnostics With Machine Learning". This paper focuses on using kernel density estimators to curate datasets by balancing classes and filling missing null values though synthetically generated data. Additionally, this manuscript proposes a technique for joining distinct datasets to train a model with necessary features from multiple different datasets as a type of transfer-learning. The specific data provided here is the performance results of each model in question (Naive Bayes, Logistic Regression, Support Vector Machine, Decision Tree, and a Voting Classifier) using 5-Fold Cross Validation. In particular, these models were evaluated using DDA, our novel solution, compared against other frequently used techniques.
Balancing_Data.xlsx: All model results from exclusively balancing classes
Null_Filling_Data.xlsx: All model results from exclusively filling null values
Joining_Data.xlsx: All model results from joining two datasets together and training a model
Synthetic_Data.xlsx: All model results from synthetically growing a dataset to with near-identical distributions
Cervical_Data.xlsx: All model results from performing class balancing and null-filling on a single dataset for a case study
Dataset Files
- Balancing_Data.xlsx (14.25 kB)
- Cervical_Data.xlsx (9.92 kB)
- Joining_Data.xlsx (14.04 kB)
- Null_Filling_Data.xlsx (12.87 kB)
- Synthetic_Validation.xlsx (12.97 kB)