Machine Learning

Diabetic Retinopathy is the second largest cause of blindness in diabetic patients. Early diagnosis or screening can prevent the visual loss. Nowadays , several computer aided algorithms have been developed to detect the early signs of Diabetic Retinopathy ie., Microaneurysms. The AGAR300 dataset presented here facilitate the researchers for benchmarking MA detection algorithms using digital fundus images. Currently, we have released the first set of database which consists of 28 color fundus images, shows the signs of Microaneurysm.


This heart disease dataset is curated by combining 3 popular heart disease datasets. The first dataset (Collected from Kaggle)  contains 70000 records with 11 independent features which makes it the largest heart disease dataset available so far for research purposes. These data were collected at the moment of medical examination and information given by the patient. Second and third datasets contain 303 and 293 intstances respectively with 13 common features. The three datasets used for its curation are:

  1. Cardio Data (Kaggle Dataset)


Data for the study has been retrieved from a publicly available data set of a leading European P2P lending platform, Bondora ( The retrieved data is a pool of both defaulted and non-defaulted loans from the time period between 1st March 2009 and 27th January 2020. The data comprises demographic and financial information of borrowers and loan transactions. In P2P lending, loans are typically uncollateralized and lenders seek higher returns as compensation for the financial risk they take.


Recently, Temporal Information Retrieval (TIR) has grabbed the major attention of the information retrieval community. TIR exploits the temporal dynamics in the information retrieval process and harnesses both textual relevance and temporal relevance to fulfill the temporal information requirements of a user Ur Rehman Khan et al., 2018. The focus time of document is an important temporal aspect which is defined as the time to which the content of the document refers Jatowt et al., 2015; Jatowt et al., 2013; Morbidoni et al., 2018, Khan et al., 2018.


It contains the four biomarkers which we have selected for the instrument, in the first column we have the recordings for heart, in second we have recordings for temperature, third is for muscle activity and last column is for oxygen levels.


This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset, 5 heart datasets are combined over 11 common features which makes it the largest heart disease dataset available so far for research purposes. The five datasets used for its curation are:


This dataset was collected from force, current, angle (magnetic rotary encoder), and inertial sensors of the NAO humanoid robot while walking on Vinyl, Gravel, Wood, Concrete, Artificial grass, and Asphalt without a slope and while walking on Vinyl, Gravel, and Wood with a slope of 2 degrees. In total, counting all different axes and components of each sensor, we monitored 27 parameters on-board of the robot.


GesHome dataset consists of 18 hand gestures from 20 non-professional subjects with various ages and occupation. The participant performed 50 times for each gesture in 5 days. Thus, GesHome consists of 18000 gesture samples in total. Using embedded accelerometer and gyroscope, we take 3-axial linear acceleration and 3-axial angular velocity with frequency equals to 25Hz. The experiments have been video-recorded to label the data manually using ELan tool.


This is an alarm management dataset based on the “Tennessee-Eastman-Process” (TEP). The presented dataset aims to provide a suitable benchmark for the development and validation of alarm management methods in complex industrial processes using both quantitative data and qualitative information from different sources. Unlike real industrial processes, the simulation of the TEP allows to design and generate abnormal situations, which can be repeated and varied without risking the loss of equipment or harming the environment.


This data-set consists of 3-phase differential currents of internal faults and 4 other transients cases for Phase Angle Regulators (PAR). The transients other than faults include magnetizing inrush, sympathetic inrush, external faults with CT saturation, and overexcitation conditions.
 PSCAD/EMTDC software is used for simulation of the internal faults and the transients.