Imbalanced Data

Citation Author(s):
Blessa Binolin Pepsi
M
Submitted by:
Blessa Binolin M
Last updated:
Wed, 08/23/2023 - 06:46
DOI:
10.21227/3271-k121
License:
514 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

Classification learning on non-stationary data may face dynamic changes from time to time. The major problem in it is the class imbalance and high cost of labeling instances despite drifts. Imbalance is due to lower number of samples in the minority class than the majority class. Imbalanced data results in the misclassification of data points. This paper proposes a technique for rebalancing data with an oversampling approach using imputation methods and Hybrid Firefly Optimisation algorithm as a novel classifier to perform classification.Electricity dataset includes attributes related to power consumption with targets as electricity up or down.Imputation methods improve the number of minority samples on a data chunk. Firefly algorithm is optimised as a classification technique with tuned weights using boosting ensemble classifiers. The proposed system is tested on seven synthetic data and five data stream generators. The evaluation metrics like F-measure, AUC and G-mean are analyzed to investigate the performance 

Instructions: 

Electricity dataset includes attributes related to power consumption with targets as electricity up or down.