data curation

The data included here within is the associated model training results from the correlated paper "Distribution-Driven Augmentation of Real-World Datasets for Improved Cancer Diagnostics With Machine Learning". This paper focuses on using kernel density estimators to curate datasets by balancing classes and filling missing null values though synthetically generated data. Additionally, this manuscript proposes a technique for joining distinct datasets to train a model with necessary features from multiple different datasets as a type of transfer-learning.

Categories:
22 Views

We develop a Systematic Mapping Study to observe the fundamentals and techniques used in Data Curation for Big Data. We focus on computational/mathematical techniques, and application scenarios with the aim of answering the following questions: (i) How Mathematics has contributed in the context of Data Curation? (ii) Are there classes of optimization algorithms being used in the context of Data Curation? If yes, which? (iii) In which application scenarios the Data Curation process has presented greater contributions? Our search was performed in some well-known bibliographic sites.

Categories:
124 Views