Machine Learning

EmoSurv: A typing biometric (Keystroke dynamics) dataset with emotion labels created using computer keyboards

EmoSurv is a dataset containing keystroke data along with emotion labels. Timing and frequency data is recorded while participants are typing free and fixed texts before and after being induced specific emotions. These emotions are: Anger, Happiness, Calmness, Sadness, and Neutral state.

First, data is collected while the participant is in a neutral state. Then, the participant watches an eliciting video. Once the emotion is induced in the participant, he types another fixed and free text.

Categories:: Artificial Intelligence
Machine Learning
Other

3559 Views

Shoulder Physiotherapy Activity Recognition 9-Axis Dataset

Shoulder Physiotherapy Activity Recognition 9-Axis Dataset (SPARS9x)

Suggested uses of this dataset include performing supervised classification analysis of physiotherapy exercises, or to perform out-of-distribution detection analysis with unlabeled activities of daily living data.

Description:

Categories:: Artificial Intelligence
Machine Learning
Wearable Sensing
Biomedical and Health Sciences
Sensors
Health

1737 Views

CMSO CFAR classifier

CMSO CFAR NN classifier

Categories:: Machine Learning

96 Views

Abilify Oral user reviews

The dataset provides Abilify Oral user reviews and ratings for drug’s satisfaction, effectiveness, and ease of use on different age groups.

Categories:: Artificial Intelligence
Machine Learning
Computational Intelligence
Biomedical and Health Sciences

294 Views

Tweets Originating from India During COVID-19 Lockdowns

This India-specific COVID-19 tweets dataset has been curated using the large-scale Coronavirus (COVID-19) Tweets Dataset. This dataset contains tweets originating from India during the first week of each of the four phases of nationwide lockdowns initiated by the Government of India. For more information on filtering keywords, please visit the primary dataset page.

Announcements:

Categories:: Machine Learning
COVID-19

5076 Views

PT7 Web, an Annotated Portuguese Language Corpus

PT7 Web is an annotated Portuguese language Corpus built from samples collected from Sep 2018 to Mar 2020 from seven Portuguese-speaking countries: Angola, Brazil, Portugal, Cape Verde, Guinea-Bissau, Macao e Mozambique. The records were filtered from Common Crawl — a public domain petabyte-scale dataset of webpages in many languages, mixed together in temporal snapshots of the web, monthly available [1]. The Brazilian pages were labeled as the positive class and the others as the negative class (non-Brazillian Portuguese).

Categories:: Cloud Computing
Machine Learning

566 Views

Linear Code Sentences English/French

Parallel sentences in English and French, with mathematical expressions tokenized. The French sentences were extracted from course notes on error-correcting codes authored by Dr. Monica Nevins, University of Ottawa.

Categories:: Machine Learning

138 Views

Supplementary Material for paper "Straightforward Working Principles Behind Modern Data Visualization Approaches"

From state-of-the-art visualization algorithms, we distill six working principles which are, by hypothesis, sufficient to produce visual projections qualitatively similar to those obtained with these state-of-the-art algorithms. These working principles are presented through the geometrical reasoning of the classical Multidimensional Scaling algorithm, and their effectiveness is illustrated through a novel straightforward algorithm for image visualization.

Categories:: Machine Learning

123 Views

Gaussian Blobs of Varying numbers of samples, centers and features

The dataset has Gaussian Blobs of varying samples, centers and features. The number of samples ranges from 500 to 50,000. Similarly, the number of centers varies from 2 to 100, while the number of features varies from 2 to 2048. These different sets of Gaussian blobs can be used for testing clustering algorithms for their scalability and effectiveness. There are two kinds of files inside the compressed sets. Files ending with "_X.csv" consist of datapoints, while the files ending with "_y.csv" represent respective class data.

Categories:: Machine Learning
Computational Intelligence

2871 Views

Dataset for binary classification of digital sensor signals

The dataset is composed of digital signals obtained from a capacitive sensor electrodes that are immersed in water or in oil. Each signal, stored in one row, is composed of 10 consecutive intensity values and a label in the last column. The label is +1 for a water-immersed sensor electrode and -1 for an oil-immersed sensor electrode. This dataset should be used to train a classifier to infer the type of material in which an electrode is immersed in (water or oil), given a sample signal composed of 10 consecutive values.

Categories:: Artificial Intelligence
Digital signal processing
Discrete-time signal processing
Machine Learning
Climate Change/Environmental
Sensors

2333 Views