Machine Learning

This database contains Synthetic High-Voltage Power Line Insulator Images.

There are two sets of images: one for image segmentation and another for image classification.

The first set contains images with different types of materials and landscapes, including the following landscape types: Mountains, Forest, Desert, City, Stream, Plantation. Each of the above-mentioned landscape types consists of 2,627 images per insulator type, which can be Ceramic, Polymeric or made of Glass, with a total of 47,286 distinct images.


To address the challenges faced by patients with neurodegenerative disorders, Brain-Computer Interface (BCI) solutions are being developed. However, many current datasets lack inclusion of languages spoken by patients, such as Telugu, which is spoken by over 90 million people in India. To bridge this gap, we have created a dataset comprising Electroencephalograph (EEG) signal samples of commonly used Telugu words. Using the Open-BCI Cyton device, EEG samples were captured from volunteers as they pronounced these words.


Popularity of smartphones also popularized, reading content using smartphones. Reading using smartphones quite differs from reading using desktop system. Mouse and Keyboard are the peripherals associated with the reading in desktop systems. Study of the handling of such devices has led to provide implicit feedback of the content read. Similar study in smartphones to get implicit feedback remains to be a huge gap. Reading using smartphones involves screen gestures like pinch to zoom, tap, scroll, orientation change and screen capture.


The dataset consists of 4-channeled EOG data recorded in two environments. First category of data were recorded from 21 poeple using driving simulator (1976 samples). The second category of data were recorded from 30 people in real-road conditions (390 samples).

All the signals were acquired with JINS MEME ES_R smart glasses equipped with 3-point EOG sensor. Sampling frequency is 200 Hz.


The dataset involves two sets of participants: a group of twenty skilled drivers aged between 40 and 68, each having a minimum of ten years of driving experience (class 1), and another group consisting of ten novice drivers aged between 18 and 46, who were currently undergoing driving lessons at a driving school (class 2).

The data was recorded using JINS MEME ES_R smart glasses by JINS, Inc. (Tokyo, Japan).

Each file consists of a signals from one sigle ride.


data have 16 features with 1 target value

Scope: Primarily focused on diabetes-related information.

Data Size: Contains a substantial volume of records.

Variables: Likely includes patient demographics, medical history, lab results, medications, treatments, and outcomes.

Temporal Range: Time span covered by the dataset may vary.

Privacy Measures: Anonymized to protect patient identities.

Ethical Considerations: Collected and shared adhering to ethical guidelines.


SeaIceWeather Dataset 

This is the SeaIceWeather dataset, collected for training and evaluation of deep learning based de-weathering models. To the best of our knowledge, this is the first such publicly available dataset for the sea ice domain. This dataset is linked to our paper titled: Deep Learning Strategies for Analysis of Weather-Degraded Optical Sea Ice Images. The paper can be accessed at: 


QuaN is a collection of specially designed datasets for exploring the impact of noise quantum machine learning and other applications. The presented work focuses on the transformation of clean datasets into noisy counterparts across diverse domains, including MNIST-handwritten digits datasets, Medical MNIST, IRIS datasets and Mobile Health datasets. The dataset is created using noise from classical and quantum domains.


This paper introduces a dataset capturing brain signals generated by the recognition of 100 Malayalam words, accompanied by their English translations. The dataset encompasses recordings acquired from both vocal and sub-vocal modalities for the Malayalam vocabulary. For the English equivalents, solely vocal signals were collected. This dataset is created to help Malayalam speaking patients with neuro-degenerative diseases.


A real world radio frequency fingerprinting (RFF) dataset for enhancement strategy by exploiting the physical unclonable function (PUF) to tune the RF hardware impairments in a unique and secure manner, which is exemplified by taking power amplifiers (PAs) in RF chains as an example. This is achieved by intentionally and slightly tuning the PA non-linearity characteristics using the active load-pulling technique. The dataset is collected from the cable-connected measurement and over-the-air measurement.