Skip to main content

Machine Learning

The Metaverse Gait Authentication Dataset (MGAD) is a large-scale gait dataset designed for biometric authentication in virtual environments. It contains gait data from 5,000 simulated users, generated in Unity 3D and processed using OpenPose and MediaPipe to extract 16 key features, including stride length, step frequency, joint angles, ground reaction forces, and gait symmetry index.

Categories:

We curated and release a real-world medical clinical dataset, namely MedCD, in the context of building generative artificial intelligence (AI) applications in the clinical setting. The MedCD dataset is one of the accomplishments from our longitudinal applied AI research and deployment in a tertiary care hospital in China. First, the dataset is real and comprehensive, in that it was sourced from real-world electronic health records (EHRs), clinical notes, lab examination reports and more.

Categories:

One of the leading causes of early health detriment is the increasing levels of air pollution in major cities and eventually in indoor spaces. Monitoring the air quality effectively in closed spaces like educational institutes and hospitals can improve both the health and the life quality of the occupants. In this paper, we propose an efficient Indoor Air Quality (IAQ) monitoring and management system, which uses a combination of cutting-edge technologies to monitor and predict major air pollutants like CO2, PM2.5, TVOCs, and other factors like temperature and humidity.

Categories:

Overview

This dataset contains detailed experimental data from a series of tests conducted to evaluate the performance of a pulsed water jet ablation system. The experiments aim to investigate the effects of various parameters on the ablation process when cutting through material composites such as PLA/Bone Cement and Bone. The experiments involve layers of different materials, including metal, plastic, and bone cement. The primary objective is to understand the material differentiation.

Dataset Content

Categories:

A new small aerial flame dataset, called the Aerial Fire and Smoke Essential (AFSE) dataset, is created which is comprised of screenshots from different YouTube wildfire videos as well as images from FLAME2. Two object categories are included in this dataset: smoke and fire. The collection of images is made to mostly contain pictures utilizing aerial viewpoints. It contains a total of 282 images with no augmentations and has a combination of images with only smoke, fire and smoke, and no fire nor smoke.

Categories:

The Explainable Sentiment Analysis Dataset provides annotated sentiment classification data for Amazon Reviews and IMDB Movie Reviews, facilitating the evaluation of sentiment analysis models with a focus on explainability. It includes ground-truth sentiment labels, model-generated predictions, and fine-grained classification results obtained from various large language models (LLMs), including both proprietary (GPT-4o/GPT-4o-mini) and open-source models (DeepSeek-R1 full and distilled models).

Categories:

The TripAdvisor online airline review dataset, spanning from 2016 to 2023, provides a comprehensive collection of passenger feedback on airline services during the COVID-19 pandemic. This dataset includes user-generated reviews that capture sentiments, preferences, and concerns, allowing for an in-depth analysis of shifting customer priorities in response to pandemic-related disruptions. By examining these reviews, the dataset facilitates the study of evolving passenger expectations, changes in service perceptions, and the airline industry's adaptive strategies.

Categories:

We created this dataset to study Outdoor-to-Indoor (O2I) signal propagation using four UAV transmitters and 17,485 receivers positioned inside the building. For each receiver location and transmitter, we generated up to 25 multipath components by simulating six transmissions, six reflections, one diffraction, and diffused multipath (comprising two transmissions and one diffraction) using Remcom's Wireless InSite.

Categories:

DALHOUSIE NIMS LAB ATTACK IOT DATASET 2025-1 dataset comprises of four prevalent types attacks, namely Portscan, Slowloris, Synflood, and Vulnerability Scan, on nine distinct Internet of Things (IoT) devices. These attacks are very common on the IoT eco-systems because they often serve as precursors to more sophisticated attack vectors. By analyzing attack vector traffic characteristics and IoT device responses, our dataset will aid to shed light on IoT eco-system vulnerabilities.

Categories: