First Name: 
Sadiksha
Last Name: 
sharma

Datasets & Analysis

The dataset has Gaussian Blobs of varying samples, centers and features.  The number of samples ranges from 500 to 50,000. Similarly, the number of centers varies from 2 to 100, while the number of features varies from 2 to 2048. These different sets of Gaussian blobs can be used for testing clustering algorithms for their scalability and effectiveness. There are two kinds of files inside the compressed sets. Files ending with "_X.csv" consist of datapoints, while the files ending with "_y.csv" represent respective class data.

Instructions: 

Please go through the documentation files (PDFs) before downloading the compressed zips. The PDFs contain lists of files that are within each compressed file.

The datapoints have real numbers up to 15 decimal places. The algorithm might converge, taking a long time because of such decimal precision. So if you need to round off the numbers, you can do that through DataFrameName.round(decimals=decimal_place).

Categories:
509 Views