Model Performance Results For Distribution-Driven Augmentation of Medical Data

Citation Author(s):
Stephen
Price
Worcester Polytechnic Institute
Winston O.
Soboyejo
Worcester Polytechnic Institute
Rodica
Neamtu
State University of New York Polytechnic Institute
Submitted by:
Stephen Price
Last updated:
Mon, 07/08/2024 - 15:58
DOI:
10.21227/685w-tx28
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The data included here within is the associated model training results from the correlated paper "Distribution-Driven Augmentation of Real-World Datasets for Improved Cancer Diagnostics With Machine Learning". This paper focuses on using kernel density estimators to curate datasets by balancing classes and filling missing null values though synthetically generated data. Additionally, this manuscript proposes a technique for joining distinct datasets to train a model with necessary features from multiple different datasets as a type of transfer-learning. The specific data provided here is the performance results of each model in question (Naive Bayes, Logistic Regression, Support Vector Machine, Decision Tree, and a Voting Classifier) using 5-Fold Cross Validation. In particular, these models were evaluated using DDA, our novel solution, compared against other frequently used techniques. 

Instructions: 

Balancing_Data.xlsx: All model results from exclusively balancing classes

Null_Filling_Data.xlsx: All model results from exclusively filling null values 

Joining_Data.xlsx: All model results from joining two datasets together and training a model

Synthetic_Data.xlsx: All model results from synthetically growing a dataset to with near-identical distributions

Cervical_Data.xlsx: All model results from performing class balancing and null-filling on a single dataset for a case study