data analytics

To provide machine learning and data science experts with a more robust dataset for model training, the well-known Palmer Penguins dataset has been expanded from its original 344 rows to 100,000 rows. This substantial increase was achieved using an adversarial random forest technique, effectively generating additional synthetic data while maintaining key patterns and features. The method achieved an impressive accuracy of 88%, ensuring the expanded dataset remains realistic and suitable for classification tasks.


This is a fictional data set, provided by the IBM. These data set contains atmost 30 features of categorical and discreet data. These data are kind of both numerical and text values which help in analysing the employee data from hiring to firing and on boarding to attrition. 


The dataset includes active power measurements for a residential dwelling (apartment) located in Bucharest, Romania, collected at 1s second reporting rate over several months.
Always-on appliances include the refrigerator and the wireless router. Several other appliances are installed in the residential unit: washing machine, lighting fixtures, electrical iron, vacuum cleaner, various ICT charging devices, and air conditioning (seldom used).


This dataset was extracted from Twitter using keywords related to Dilma Roussef and Aécio Neves, that were the candidates of the second round of the 2014 presidential election in Brazil. This dataset contains texts in Portuguese and the respective classification of sentiments resulting from the techniques described in the article published in the 2018 IEEE International Conference on Data Mining Workshops - ICDMW ( 

