Speech

Speech Processing in noisy condition allows researcher to build solutions that work in real world conditions. Environmental noise in Indian conditions are very different from typical noise seen in most western countries. This dataset is a collection of various noises, both indoor and outdoor ollected over a period of several months. The audio files are of the format RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 11025 Hz and have been recorded using the Dialogic CTI card.

Categories:
828 Views

The training, validation, and test set used for Deep Xi (https://github.com/anicolson/DeepXi). 

 Training set:

Categories:
3411 Views

A composite dataset with eight videos (totaling the pronunciation of seventeen words, with intervals, sagittal plane, and gray scale), for experiments in computer vision, video processing, and articulation investigation of the vocal tract.

Categories:
287 Views

Time Scale Modification (TSM) is a well-researched field; however, no effective objective measure of quality exists.  This paper details the creation, subjective evaluation, and analysis of a dataset for use in the development of an objective measure of quality for TSM. Comprised of two parts, the training component contains 88 source files processed using six TSM methods at 10 time scales, while the testing component contains 20 source files processed using three additional methods at four time scales.

Categories:
568 Views

Our efforts are made on one-shot voice conversion where the target speaker is unseen in training dataset or both source and target speakers are unseen in the training dataset. In our work, StarGAN is employed to carry out voice conversation between speakers. An embedding vector is used to represent speaker ID. This work relies on two datasets in English and one dataset in Chinese, involving 38 speakers. A user study is conducted to validate our framework in terms of reconstruction quality and conversation quality.

Categories:
572 Views

The dataset consists of EEG recordings obtained when subjects are listening to different utterances : a, i, u, bed, please, sad. A limited number of EEG recordings where also obtained when the three vowels were corrupted by white and babble noise at an SNR of 0dB. Recordings were performed on 8 healthy subjects.

Categories:
1117 Views

I would be grateful if you cite my two following papers:

Categories:
3896 Views