The dataset provides Abilify Oral user reviews and ratings for drug’s satisfaction, effectiveness, and ease of use on different age groups.


The dataset has Gaussian Blobs of varying samples, centers and features.  The number of samples ranges from 500 to 50,000. Similarly, the number of centers varies from 2 to 100, while the number of features varies from 2 to 2048. These different sets of Gaussian blobs can be used for testing clustering algorithms for their scalability and effectiveness. There are two kinds of files inside the compressed sets. Files ending with "_X.csv" consist of datapoints, while the files ending with "_y.csv" represent respective class data.


Please go through the documentation file before downloading the compressed zips. The PDF contains list of files that are within each compressed file.

The datapoints have real numbers up to 15 decimal places. The algorithm might converge, taking a long time because of such decimal precision. So if you need to round off the numbers, you can do that through DataFrameName.round(decimals=decimal_place).


Most text-simplification systems require an indicator of the complexity of the words. The prevalent approaches to word difficulty prediction are based on manual feature engineering. Using deep learning based models are largely left unexplored due to their comparatively poor performance. We have explored the use of one of such in predicting the difficulty of words. We have treated the problem as a binary classification problem. We have trained traditional machine learning models and evaluated their performance on the task.


The data is in CSV format. Please check the research paper for obtaining the difficulty label from the I_Z score.


Cyber attacks are a growing concern for small businesses during COVID-19 . Be Protected While You Work. Upgrade Your Small Business's Virus Protection Today!


Game Building statistical analysis


The data uploaded here shall support the paper 

Decision Tree Analysis of  ...

which has been submitted to IEEE Transactions on Medical Imaging (2020, September 25) by the authors

Julian Mattes, Wolfgang Fenz, Stefan Thumfart, Gerhard Haitchi, Pierre Schmit, Franz A. Fellner

During review the data shall only be visible for the reviewers of this paper. Afterwards this abstract will be modified and complemented and a dataset image will be uploaded.


The data set contains electrical and mechanical signals from experiments on three-phase induction motors. The experimental tests were carried out for different mechanical loads on the induction motor axis and different severities of broken bar defects in the motor rotor, including data regarding the rotor without defects. Ten repetitions were performed for each experimental condition.


Experimental Setup:

The experimental workbench consists of a three-phase induction motor coupled with a direct-current machine, which works as a generator simulating the load torque, connected by a shaft containing a rotary torque wrench.

- Induction motor: 1hp, 220V/380V, 3.02A/1.75A, 4 poles, 60 Hz, with a nominal torque of 4.1 Nm and a rated speed of 1715 rpm. The rotor is of the squirrel cage type composed of 34 bars.

- Load torque: is adjusted by varying the field winding voltage of direct current generator. A single-phase voltage variator with a filtered full-bridge rectifier is used for the purpose. An induction motor was tested under 12.5, 25, 37.5, 50, 62.5, 75, 87.5 and 100% of full load.

- Broken rotor bar: to simulate the failure on the three-phase induction motor's rotor, it was necessary to drill the rotor. The rupture rotor bars are generally adjacent to the first rotor bar, 4 rotors have been tested, the first with a break bar, the second with two adjacent broken bars, and so on rotor containing four bars adjacent broken.

Monitoring condition:

All signals were sampled at the same time for 18 seconds for each loading condition and ten repetitions were performed from transient to steady state of the induction motor.

- mechanical signals: five axial accelerometers were used simultaneously, with a sensitivity of 10 mV/mm/s, frequency range from 5 to 2000Hz and stainless steel housing, allowing vibration measurements in both drive end (DE) and non-drive end (NDE) sides of the motor, axially or radially, in the horizontal or vertical directions.

- electrical signals: the currents were measured by alternating current probes, which correspond to precision meters, with a capacity of up to 50ARMS, with an output voltage of 10 mV/A, corresponding to the Yokogawa 96033 model. The voltages were measured directly at the induction terminals using voltage points of the oscilloscope and the manufacturer Yokogawa.

Data Set Overview:

-          Three-phase Voltage

-          Three-phase Current

-          Five Vibration Signals



            The database was acquired in the Laboratory of Intelligent Automation of Processes and Systems and Laboratory of Intelligent Control of Electrical Machines, School of Engineering of São Carlos of the University of São Paulo (USP), Brazil.


This dataset was extracted from Twitter using keywords related to Dilma Roussef and Aécio Neves, that were the candidates of the second round of the 2014 presidential election in Brazil. This dataset contains texts in Portuguese and the respective classification of sentiments resulting from the techniques described in the article published in the 2018 IEEE International Conference on Data Mining Workshops - ICDMW ( 



The .zip file is divided into four .csv files with data organized in 11 columns named: date, amount of retweets, amount of favorites, tweet text, mentions, hashtags, id, permalink, a score of classification, label of sentiment.


The VND MO test benchmark problems




We build an original dataset of thermal videos and images that simulate illegal movements around the border and in protected areas and are designed for training machines and deep learning models. The videos are recorded in areas around the forest, at night, in different weather conditions – in the clear weather, in the rain, and in the fog, and with people in different body positions (upright, hunched) and movement speeds (regu- lar walking, running) at different ranges from the camera.



About 20 minutes of recorded material from the clear weather scenario, 13 minutes from the fog scenario, and about 15 minutes from rainy weather were processed. The longer videos were cut into sequences and from these sequences individual frames were extracted, resulting in 11,900 images for the clear weather, 4,905 images for the fog, and 7,030 images for the rainy weather scenarios.

A total of 6,111 frames were manual annotated so that could be used to train the supervised model for person detection. When selecting the frames, it was taken into account that the selected frames include different weather conditions so that in the set there were 2,663 frames shot in clear weather conditions, 1,135 frames of fog, and 2,313 frames of rain.

The annotations were made using the open-source Yolo BBox Annotation Tool that can simultaneously store annotations in the three most popular machine learning annotation formats YOLO, VOC, and MS COCO so all three annotation formats are available. The image annotation consists of a centroid position of the bounding box around each object of interest, size of the bounding box in terms of width and height, and corresponding class label (Human or Dog).