Machine Learning
Fair Use for Academic Research: If you use this dataset, please cite the following paper to ensure proper attribution
M. A. Onsu, P. Lohan, B. Kantarci, A. Syed, M. Andrews, S. Kennedy, "Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring," 30th IEEE Symposium on Computers and Communications (ISCC), July 2025, Bologna, Italy.
Preprint available here: https://arxiv.org/pdf/2502.11304
- Categories:

This dataset includes conjunctival and retinal images collected from both diabetic and healthy individuals to support research on diabetes-related vascular changes. For each subject, eight conjunctival images (four per eye: looking left, right, up, and down) are provided. Subjects with diabetes additionally have corresponding left and right retinal fundus images. Metadata for diabetic participants includes classification into subgroups: diabetes only, diabetes with retinopathy, or diabetes with related complications such as hypertension.
- Categories:

<p><span style="font-family: 'Times New Roman'; font-size: medium;">This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between ``bad'' connections, called intrusions or attacks, and ``good'' normal connections.
- Categories:
The use of technology in cricket has seen a significant increase in recent years, leading to overlapping computer vision-based research efforts. This study aims to extract front pitch view shots in cricket broadcasts by utilizing deep learning. The front pitch view (FPV) shots include ball delivery by the bowler and the stroke played by the batter. FPV shots are valuable for highlight generation, automatic commentary generation and bowling and batting techniques analysis. We classify each broadcast video frame as FPV and non-FPV using deep-learning models.
- Categories:
This dataset presents a curated collection of 9,000 English verbs annotated with normalized fuzzy values across four cognitive-behavioral quadrants of the BEET-M (Behavior Engagement Emotion Trigger Modes) model: Value & Credibility (NW), Relationship & Human Impact (NE), Process & Information (SE), and Time Urgency (SW).
- Categories:

Droidware is an Android malware dataset developed at the Cybersecurity Lab, GLA University, India. It comprises 253,527 applications, including 129,950 benign and 123,577 malicious samples. The dataset captures 68 features extracted from function call graphs, permissions, and Java source code, providing a comprehensive view of Android malware behavior. This latest and up-to-date dataset supports the training of AI-based malware detection models, aiding in the development of robust malware classification and threat mitigation strategies for cybersecurity research.
- Categories:

The Travel Recommendation Dataset is a comprehensive dataset designed for building and evaluating conversational recommendation systems in the travel domain. It includes detailed information about users, destinations, and ratings, enabling researchers and developers to create personalized travel recommendation models. The dataset supports use cases such as personalizing travel recommendations, analyzing user behavior, and training machine learning models for recommendation tasks.
- Categories:

The source data files and code files of the paper: optical chaos shift keying communication system via neural network-based signal reconstruction. The following data is included:
1. Source figure file in the paper;
2. Source code of the proposed scheme, include the simulation code for communication, secure analysis and parameter mismatch range.
3. The source Simulink module is included for time-delayed chaotic signal generation.
- Categories:

This dataset comprises vibration signals collected from bearing test rigs under both healthy and faulty conditions, designed to support research in fault diagnosis and out-of-distribution (OOD) detection. The data includes:
-
CWRU Dataset: Signals from the Case Western Reserve University bearing test platform, sampled at 12 kHz, covering normal operation and three fault types (inner race, outer race, and rolling element faults) with varying severities (0.007–0.021 inches). OOD samples are explicitly labeled for validation.
- Categories:

Paper : Assessment of Inference Improvements for Facial Micronutrient Deficiency Detection using Attention-Enhanced YOLOv5
Authors : Amey Agarwal, Shreya Rathod, Riva Rodrigues, Nirmitee Sarode, Dhananjay R. Kalbande
Desciption
This is a dataset of 7 classes : 6 facial skin problems and 1 null class.
A facial skin problem may be identified in an image and marked using Bounding Box Annotation.
Acne Class indicates deficiency of Vitamin D
Blackhead and Nodules are types of acne
- Categories: