Speech Processing in noisy condition allows researcher to build solutions that work in real world conditions. Environmental noise in Indian conditions are very different from typical noise seen in most western countries. This dataset is a collection of various noises, both indoor and outdoor ollected over a period of several months. The audio files are of the format RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 11025 Hz and have been recorded using the Dialogic CTI card.


A composite dataset with eight videos (totaling the pronunciation of seventeen words, with intervals, sagittal plane, and gray scale), for experiments in computer vision, video processing, and articulation investigation of the vocal tract.


In this dataset:- There is no audio.- Sagittal image- Grey Scale


Time Scale Modification (TSM) is a well-researched field; however, no effective objective measure of quality exists.  This paper details the creation, subjective evaluation, and analysis of a dataset for use in the development of an objective measure of quality for TSM. Comprised of two parts, the training component contains 88 source files processed using six TSM methods at 10 time scales, while the testing component contains 20 source files processed using three additional methods at four time scales.


When using this dataset, please use the following citation:

author = {Roberts,Timothy and Paliwal,Kuldip K. },
title = {A time-scale modification dataset with subjective quality labels},
journal = {The Journal of the Acoustical Society of America},
volume = {148},
number = {1},
pages = {201-210},
year = {2020},
doi = {10.1121/10.0001567},
URL = {https://doi.org/10.1121/10.0001567},
eprint = {https://doi.org/10.1121/10.0001567}


Audio files are named using the following structure: SourceName_TSMmethod_TSMratio_per.wav and split into multiple zip files.For 'TSMmethod', PV is the Phase Vocoder algorithm, PV_IPL is the Identity Phase Locking Phase Vocoder algorithm, WSOLA is the Waveform Similarity Overlap-Add algorithm, FESOLA is the Fuzzy Epoch Synchronous Overlap-Add algorithm, HPTSM is the Harmonic-Percussive Separation Time-Scale Modification algorithm and uTVS is the Mel-Scale Sub-Band Modelling Filterbank algorithm. Elastique is the z-Plane Elastique algorithm, NMF is the Non-Negative Matrix Factorization algorithm and FuzzyPV is the Phase Vocoder algorithm using Fuzzy Classification of Spectral Bins.TSM ratios range from 33% to 192% for training files, 20% to 200% for testing files and 22% to 220% for evaluation files.

  • Train: Contains 5280 processed files for training neural networks
  • Test: Contains 240 processed files for testing neural networks
  • Ref_Train: Contains the 88 reference files for the processed training files
  • Ref_Test: Contains the 20 reference files for the processed testing files
  • Eval: Contains 6000 processed files for evaluating TSM methods.  The 20 reference test files were processed at 20 time-scales using the following methods:
    • Phase Vocoder (PV)
    • Identity Phase-Locking Phase Vocoder (IPL)
    • Scaled Phase-Locking Phase Vocoder (SPL)
    • Phavorit IPL and SPL
    • Phase Vocoder with Fuzzy Classification of Spectral Bins (FuzzyPV)
    • Waveform Similarity Overlap-Add (WSOLA)
    • Epoch Synchronous Overlap-Add (ESOLA)
    • Fuzzy Epoch Synchronous Overlap-Add (FESOLA)
    • Driedger's Identity Phase-Locking Phase Vocoder (DrIPL)
    • Harmonic Percussive Separation Time-Scale Modification (HPTSM)
    • uTVS used in Subjective testing (uTVS_Subj)
    • updated uTVS (uTVS)
    • Non-Negative Matrix Factorization Time-Scale Modification (NMFTSM)
    • Elastique.


TSM_MOS_Scores.mat is a version 7 MATLAB save file and contains a struct called data that has the following fields:

  • test_loc: Legacy folder location of the test file.
  • test_name: Name of the test file.
  • ref_loc: Legacy folder location of reference file.
  • ref_name: Name of the reference file.
  • method: The method used for processing the file.
  • TSM: The time-scale ratio (in percent) used for processing the file. 100(%) is unity processing. 50(%) is half speed, 200(%) is double speed.
  • MeanOS: Normalized Mean Opinion Score.
  • MedianOS: Normalized Median Opinion Score.
  • std: Standard Deviation of MeanOS.
  • MeanOS_RAW: Mean Opinion Score before normalization.
  • MedianOS_RAW: Median Opinion Scores before normalization.
  • std_RAW: Standard Deviation of MeanOS before normalization.


TSM_MOS_Scores.csv is a csv containing the same fields as columns.

Source Code and method implementations are available at www.github.com/zygurt/TSM

Please Note: Labels for the files will be uploaded after paper publication.


Our efforts are made on one-shot voice conversion where the target speaker is unseen in training dataset or both source and target speakers are unseen in the training dataset. In our work, StarGAN is employed to carry out voice conversation between speakers. An embedding vector is used to represent speaker ID. This work relies on two datasets in English and one dataset in Chinese, involving 38 speakers. A user study is conducted to validate our framework in terms of reconstruction quality and conversation quality.


This is the supporting content for my ICASSP 2020 paper.

Paper number: 5581.


The dataset consists of EEG recordings obtained when subjects are listening to different utterances : a, i, u, bed, please, sad. A limited number of EEG recordings where also obtained when the three vowels were corrupted by white and babble noise at an SNR of 0dB. Recordings were performed on 8 healthy subjects.


Recordings were performed at the Centre de recherche du Centre hospitalier universitaire de Sherbrooke (CRCHUS), Sherbrooke (Quebec), Canada. The EEG recordings were performed using an actiCAP active electrode system Version I and II (Brain Products GmbH, Germany) that includes 64 Ag/AgCl electrodes. The signal was amplified with BrainAmp MR amplifiers and recorded using the Vision Recorder software. The electrodes were positioned using a standard 10-20 layout. Experiments were performed on 8 healthy subjects without any declared hearing impairment. Each session lasted approximately 90 minutes and was separated in 2 parts. The first part, lasting 30 minutes, consisted in installing the cap on the subject where an electroconductive gel was placed under each electrode to ensure a proper contact between the electrode and the scalp. The second part, which was the listening and EEG acquisition, lasted approximately 60 minutes. The subjects then had to stay still with eyes closed while avoiding any facial movement or swallowing. They had to remain concentrated on the audio signals during the full length of the experiment. Audio signals were presented to the subjects through earphones while EEGs were recorded. During the experiment, each trial was repeated randomly at least 80 times. A stimulus was presented randomly within each trial which lasted approximately 9 seconds. A 2-minute pause was given after 5 minutes of trials where the subjects could relax and stretch. Once the EEG signals were acquired, they were resampled at 500 Hz and band-pass filtered between 0.1 Hz and 45 Hz in order to extract the frequency bands of interest for this study. EEG signals were then separated into 2-second intervals where the stimulus was presented at 0.5 second within each interval. If the signal amplitude exceeded a pre-defined 75 V limit, the trial was marked for rejection. A sample code is provided to read the dataset and generate ERPs. One needs first to run the epoch_data.m for the specific subject and then run the mean_data.m file in the ERP folder. EEGLab for Matlab is required.


This dataset is associated with the paper, Giovanni Dimauro et al. 2017, which is open source, and can be found here: https://ieeexplore.ieee.org/document/8070308

The DataPort Repository contains the data used primarily for generating Figure 2,3,4,5


The paper associated with the dataset describes in great detail how the dataset was created, what it contains and how it can be used. In any case, the content can be easily understood by comparing the .xlsx files with the wav files, both included in the zip file. However, the authors are available to provide further details to anyone.