First Name: 
Last Name: 

Datasets & Competitions

The dataset has Gaussian Blobs of varying samples, centers and features.  The number of samples ranges from 500 to 50,000. Similarly, the number of centers varies from 2 to 100, while the number of features varies from 2 to 2048. These different sets of Gaussian blobs can be used for testing clustering algorithms for their scalability and effectiveness. There are two kinds of files inside the compressed sets. Files ending with "_X.csv" consist of datapoints, while the files ending with "_y.csv" represent respective class data.


Please go through the documentation file before downloading the compressed zips. The PDF contains list of files that are within each compressed file.

The datapoints have real numbers up to 15 decimal places. The algorithm might converge, taking a long time because of such decimal precision. So if you need to round off the numbers, you can do that through DataFrameName.round(decimals=decimal_place).