Datasets
Standard Dataset
UCI datasets
- Citation Author(s):
- Submitted by:
- Yuan Sun
- Last updated:
- Thu, 11/30/2023 - 03:57
- DOI:
- 10.21227/g4y0-sw34
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
The UCI dataset is a data repository maintained and made available by the University of California, Irvine that is widely used for machine learning and data mining research. The dataset covers a wide range of fields and topics, including but not limited to medicine, biology, social sciences, physics, engineering, and more. The uniqueness of this dataset is that it contains data from multiple different domains and sources, allowing researchers to explore and analyze the data from different perspectives and contexts. This experiment applies it to cluster analysis, and the main datasets used are eight synthetic datasets and six real datasets.
The datasets used in this experiment are all open source UCI datasets. Where the first column in each dataset document represents the species and the other columns are the features. For a dataset, the algorithm categorizes it based on the features and compares it with the first column to calculate an evaluation metric to evaluate the clustering effect of the algorithm.
Comments
thsnkyou
Thanks