Datasets
Standard Dataset
image

- Citation Author(s):
- Submitted by:
- LI MA
- Last updated:
- Sun, 04/13/2025 - 09:12
- DOI:
- 10.21227/45m4-j967
- License:
- Categories:
- Keywords:
Abstract
The ORL and WarpPIE datasets consist of grayscale face images, with each sample representing a single individual captured under various lighting conditions, facial expressions, and occlusions. The COIL20 dataset contains grayscale images of 20 distinct objects, with each object represented by 72 images taken from different rotational perspectives. The MNIST and USPS datasets comprise handwritten digits ranging from 0 to 9.
This file contains five benchmark datasets, with the number of clusters c varying from 2 to 10, for evaluating clustering algorithms. Each dataset has 30% of its data labeled. To assess the robustness of the clustering algorithm, synthetic outliers can be added to the datasets. These outliers constitute 20% of the total dataset and are represented as dummy images. Each pixel of these dummy images is randomly assigned a value of either 0 or 255.