Speech | IEEE DataPort

Sensitive quantification of cerebellar speech abnormalities: OpenSMILE

Objective, sensitive, and meaningful disease assessments are critical to support clinical trials and clinical care. Speech changes are one of the earliest and most evident manifestations of cerebellar ataxias. This data set contains features that can be used to train models to identify and quantify clinical signs of ataxic speech. Though raw audio or spectrograms cannot be released due to privacy concerns, this data set contains several OpenSMILE feature sets.

Categories:

WebRTC-QoE: A Dataset of Quality of Experience in Audio-Video Communications

In the realm of real-time communications, WebRTC-based multimedia applications are increasingly prevalent as these can be smoothly integrated within Web browsing sessions. The browsing experience is then significantly improved concerning scenarios where browser add-ons and/or plug-ins are used; still, the end user's Quality of Experience (QoE) in WebRTC sessions may be affected by network impairments, such as delays and losses.

Categories:

iNoise Indian Noise Database

Speech Processing in noisy condition allows researcher to build solutions that work in real world conditions. Environmental noise in Indian conditions are very different from typical noise seen in most western countries. This dataset is a collection of various noises, both indoor and outdoor ollected over a period of several months. The audio files are of the format RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 11025 Hz and have been recorded using the Dialogic CTI card.

Categories:

Signal Processing

Deep Xi dataset

The training, validation, and test set used for Deep Xi (https://github.com/anicolson/DeepXi).

Training set:

Categories:

Category

A DataSet of word sequences through MRI

A composite dataset with eight videos (totaling the pronunciation of seventeen words, with intervals, sagittal plane, and gray scale), for experiments in computer vision, video processing, and articulation investigation of the vocal tract.

Categories:

Category

Computer Vision

A Time-Scale Modification Dataset with Subjective Quality Labels

Time Scale Modification (TSM) is a well-researched field; however, no effective objective measure of quality exists. This paper details the creation, subjective evaluation, and analysis of a dataset for use in the development of an objective measure of quality for TSM. Comprised of two parts, the training component contains 88 source files processed using six TSM methods at 10 time scales, while the testing component contains 20 source files processed using three additional methods at four time scales.

Categories:

Category

ICASSP 2020 Paper 5581

Our efforts are made on one-shot voice conversion where the target speaker is unseen in training dataset or both source and target speakers are unseen in the training dataset. In our work, StarGAN is employed to carry out voice conversation between speakers. An embedding vector is used to represent speaker ID. This work relies on two datasets in English and one dataset in Chinese, involving 38 speakers. A user study is conducted to validate our framework in terms of reconstruction quality and conversation quality.

Categories:

Category

Machine Learning

Electroencephalogram (EEG) recordings obtained when simultaneously presenting audio stimulations

The dataset consists of EEG recordings obtained when subjects are listening to different utterances : a, i, u, bed, please, sad. A limited number of EEG recordings where also obtained when the three vowels were corrupted by white and babble noise at an SNR of 0dB. Recordings were performed on 8 healthy subjects.

Categories:

Category

Brain

Italian Parkinson's Voice and Speech

I would be grateful if you cite my two following papers:

Categories:

Category

Biomedical and Health Sciences