Dataset of article: Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools

- Citation Author(s):
-
Sandro Mendonça (Universidade Federal do Pará)
- Submitted by:
- Carlos Santos
- Last updated:
- DOI:
- 10.21227/5aeq-rr34
- Data Format:
- Categories:
- Keywords:
Abstract
Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.
Instructions:
- 1.000 entries
- No outliers
- No missing values
- Two dimensions (one relevant feature and one class, no bad features)
- 80\% Class separation
- Two Classes
- No Class Imbalance
Thus, six types of datasets were generated, one for each of the six characteristics in the default dataset. In each type of dataset, the system generated four datasets with slight differences in the associated characteristic. For instance, to vary the effect of the number of outliers, the system created datasets with 10\%, 20\%, 30\%, and 40\% of outliers, without changing the other characteristics. The variations of the characteristics are the following:
- Amount of outliers: [10\%, 20\%, 30\%, 40\%, 50\%]
- Class separation: [100\%, 90\%, 80\%, 70\%, 60\%]
- Amount of missing values: [10\%, 20\%, 30\%, 40\%, 50\%]
- Class imbalance: [50\%-50\%, 40\%-60\%, 30\%-70\%, 20\%-80\%, 10\%-90\%]
- Bad features: [1-1, 1-3, 1-5, 1-7, 1-9]
- Amount of classes: [2, 12, 22, 32, 42]
Dataset Files
- Dataset with 10% of missing values (Size: 22.53 KB)
- Dataset with 20% of missing values (Size: 22.75 KB)
- Dataset with 30% of missing values (Size: 22.85 KB)
- Dataset with 40% of missing values (Size: 22.46 KB)
- Dataset with balanced classes (Size: 23.48 KB)
- Dataset with imbalance of 60% for one class (Size: 23.54 KB)
- Dataset with imbalance of 70% for one class (Size: 23.57 KB)
- Dataset with imbalance of 80% for one class (Size: 23.62 KB)
- Dataset with 1 good feature and 1 bad feature (Size: 43.32 KB)
- Dataset with 1 good feature and 3 bad feature (Size: 80.89 KB)
- Dataset with 1 good feature and 7 bad feature (Size: 118.58 KB)
- Dataset with 1 good feature and 9 bad feature (Size: 156.27 KB)
- Dataset with 2 classes (Size: 23.49 KB)
- Dataset with 12 classes (Size: 22.98 KB)
- Dataset with 22 classes (Size: 23.5 KB)
- Dataset with 32 classes (Size: 23.7 KB)
- Dataset with 10 of outliers (Size: 23.51 KB)
- Dataset with 20 of outliers (Size: 23.47 KB)
- Dataset with 30 of outliers (Size: 23.44 KB)
- Dataset with 40 of outliers (Size: 23.48 KB)
- Dataset with 10% of overlap between classes (Size: 23.5 KB)
- Dataset with 20% of overlap between classes (Size: 23.47 KB)
- Dataset with 30% of overlap between classes (Size: 23.48 KB)
- Dataset with 40% of overlap between classes (Size: 23.5 KB)