Binary classifiers' outputs for ensemble creation

Name: Binary classifiers' outputs for ensemble creation
Creator: Attila Tiba
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Computational Intelligence

Citation Author(s):: Attila Tiba

Andras Hajdu

Henrietta Toman

Gyorgy Terdik
Submitted by:: Attila Tiba
Last updated:: Fri, 05/31/2019 - 08:47
DOI:: 10.21227/7pf8-nq83
Data Format:: .zip

302 views

Categories:

Computational Intelligence

Keywords:

ensemble creation methods

binary classification

ACCESS DATASET CITE

Abstract

This dataset was created based on the paper 'Andras Hajdu, Gyorgy Terdik, Attila Tiba, and Henrietta Toman: A stochastic approach to handle knapsack problems in the creation of ensembles'.To summarize our experimental setup for UCI binary classification problems, we have considered base classifiers perceptron, decision tree, Levenberg-Marquardt feedforward neural network, random neural network, and discriminative restricted Boltzmann machine classifier for the 5 UCI datasets MAGIC Gamma Telescope, HIGGS, EEG EyeState, Musk (Version 2), and Spambase; datasets of large cardinalities were selected to be able to train synthetic variants of base classifiers on different subsets.To check our models for different numbers of possible ensemble members, the respective pool sizes were set to 30 and 100; the necessary number of classifiers has been reached via synthesizing the base classifiers with training them on different subsets of the training part of the given datasets.

Instructions:

The folder data_30 contains 5 .csv files corresponding to the 5 UCI datasets.

Each .csv file contains the classification results of the 30 classifiers as follows:

- each row corresponds to a line from the corresponding UCI dataset,

- each column in the range 1-30 represents the class label predicted by a given classifier,

- column 31 represents the ground truth label of the given case,

- column 32 represents the line index of the given case from the corresponding UCI dataset,

- the first 30% of the rows contains the results on the test set, the last 70% on the training one.

The folder data_100 contains 5 .csv files corresponding to the 5 UCI datasets.

Each .csv file contains the classification results of the 100 classifiers as follows:

- each row corresponds to a line from the corresponding UCI dataset,

- each column in the range 1-100 represents the class label predicted by a given classifier,

- column 101 represents the ground truth label of the given case

- column 102 represents the line index of the given case from the corresponding UCI dataset,

- the first 30% of the rows contains the results on the test set, the last 70% on the training one.

Dataset Files

Data.zip (Size: 1.11 MB)

Datasets

Standard Dataset

Binary classifiers' outputs for ensemble creation

Abstract

Instructions:

Dataset Files

QUESTIONS?

More from this Author

Test

More like this Dataset

Retinal Fundus Multi-disease Image Dataset (RFMiD)

Experimental database for detecting and diagnosing rotor broken bar in a three-phase induction motor.

Automotive Li-ion Cell Usage Data Set

Date Fruit Dataset for Automated Harvesting and Visual Yield Estimation

Thermal image dataset for person detection - UNIRI-TID

SEARCH AND RESCUE IMAGE DATASET FOR PERSON DETECTION - SARD