Time Scale Modification (TSM) is a well-researched field; however, no effective objective measure of quality exists.  This paper details the creation, subjective evaluation, and analysis of a dataset for use in the development of an objective measure of quality for TSM. Comprised of two parts, the training component contains 88 source files processed using six TSM methods at 10 time scales, while the testing component contains 20 source files processed using three additional methods at four time scales.


When using this dataset, please use the following citation:

author = {Roberts,Timothy and Paliwal,Kuldip K. },
title = {A time-scale modification dataset with subjective quality labels},
journal = {The Journal of the Acoustical Society of America},
volume = {148},
number = {1},
pages = {201-210},
year = {2020},
doi = {10.1121/10.0001567},
URL = {https://doi.org/10.1121/10.0001567},
eprint = {https://doi.org/10.1121/10.0001567}


Audio files are named using the following structure: SourceName_TSMmethod_TSMratio_per.wav and split into multiple zip files.For 'TSMmethod', PV is the Phase Vocoder algorithm, PV_IPL is the Identity Phase Locking Phase Vocoder algorithm, WSOLA is the Waveform Similarity Overlap-Add algorithm, FESOLA is the Fuzzy Epoch Synchronous Overlap-Add algorithm, HPTSM is the Harmonic-Percussive Separation Time-Scale Modification algorithm and uTVS is the Mel-Scale Sub-Band Modelling Filterbank algorithm. Elastique is the z-Plane Elastique algorithm, NMF is the Non-Negative Matrix Factorization algorithm and FuzzyPV is the Phase Vocoder algorithm using Fuzzy Classification of Spectral Bins.TSM ratios range from 33% to 192% for training files, 20% to 200% for testing files and 22% to 220% for evaluation files.

  • Train: Contains 5280 processed files for training neural networks
  • Test: Contains 240 processed files for testing neural networks
  • Ref_Train: Contains the 88 reference files for the processed training files
  • Ref_Test: Contains the 20 reference files for the processed testing files
  • Eval: Contains 6000 processed files for evaluating TSM methods.  The 20 reference test files were processed at 20 time-scales using the following methods:
    • Phase Vocoder (PV)
    • Identity Phase-Locking Phase Vocoder (IPL)
    • Scaled Phase-Locking Phase Vocoder (SPL)
    • Phavorit IPL and SPL
    • Phase Vocoder with Fuzzy Classification of Spectral Bins (FuzzyPV)
    • Waveform Similarity Overlap-Add (WSOLA)
    • Epoch Synchronous Overlap-Add (ESOLA)
    • Fuzzy Epoch Synchronous Overlap-Add (FESOLA)
    • Driedger's Identity Phase-Locking Phase Vocoder (DrIPL)
    • Harmonic Percussive Separation Time-Scale Modification (HPTSM)
    • uTVS used in Subjective testing (uTVS_Subj)
    • updated uTVS (uTVS)
    • Non-Negative Matrix Factorization Time-Scale Modification (NMFTSM)
    • Elastique.


TSM_MOS_Scores.mat is a version 7 MATLAB save file and contains a struct called data that has the following fields:

  • test_loc: Legacy folder location of the test file.
  • test_name: Name of the test file.
  • ref_loc: Legacy folder location of reference file.
  • ref_name: Name of the reference file.
  • method: The method used for processing the file.
  • TSM: The time-scale ratio (in percent) used for processing the file. 100(%) is unity processing. 50(%) is half speed, 200(%) is double speed.
  • MeanOS: Normalized Mean Opinion Score.
  • MedianOS: Normalized Median Opinion Score.
  • std: Standard Deviation of MeanOS.
  • MeanOS_RAW: Mean Opinion Score before normalization.
  • MedianOS_RAW: Median Opinion Scores before normalization.
  • std_RAW: Standard Deviation of MeanOS before normalization.


TSM_MOS_Scores.csv is a csv containing the same fields as columns.

Source Code and method implementations are available at www.github.com/zygurt/TSM

Please Note: Labels for the files will be uploaded after paper publication.


This dataset contains the actual sensor and calculated process variables in a winder station in a paper mill. Several Process variables change in time with the change of the rewind diameter. I provided the process data for two sets, in future I will add more data. Advanced time series forcasting techniques can be used to estimate many process variables considering the rewind diameter as the time axis.


Urban flooding is a common problem across the world. In India, it leads to casualties every year, and financial loss to the tune of tens of billions of rupees. The damage done due to flooding can be mitigated if the locations deserving attention are known. This will enable an effective emergency response, and provide enough information for the construction of appropriate storm water drains to mitigate the effect of floods. In this work, a new technique to detect flooding level is introduced, which requires no additional equipment, and consequent installation and maintenance costs.


Typically, a paper mill comprises three main stations: Paper machine, Winder station, and Wrapping station. The Paper machine produces paper with particular grammage in gsm (gram per square meter). The typical grammage classes in our paper mill are 48 gsm, 50 gsm, 58 gsm, 60 gsm, 68 gsm, 70 gsm. The Winder station takes a paper spool that is about 6 m width as it’s input and transfers is to customized paper rolls with particular diameter and width.


This dataset shows the amount of water used by a company in southern China from 2016 to 2017.


The dataset has 150 three-second sampling motor current signals from each synthetically-prepared motors. There are five motors with respective fault condition - bearing axis deviation (F1), stator coil inter-turn short circuit (F2), rotor broken strip (F3), outer bearing ring damage (F4), and healthy (H). The motors are run under five coupling loads - 0, 25, 50, 75, and 100%. The sampling signals are collected and processed into frequency occurrence plots (FOPs). Each image has a label, for example F2_L50_130, where F2 is the fault condition, L50 is the coupling load condition.


A VOR receiver based on Software-Defined Radio is presented. Experiments showed that the system indicated the radials of the VOR station of São José dos Campos with an average error rate of less than 1% and a standard deviation of less than 2.14% in relation to those calculated cartographically. The results suggest that low volume and weight SDR-based VOR receivers can be developed with processing on microcontrollers or FPGAs to equip drones that need to operate in aerodrome environments.



Audio dataset for Household Multimodal Environment (HoME). It is a collection of audio samples from the Freesound.org collaborative database of Creative Commons Licensed sounds.


Extract the audio samples in the HoME root directory.


This dataset includes  the Channels Switch Sequences of 300 IPTV viewers in Guangzhou, P.R. China, in Augest, 2014. There are 4 columns in the file, which represent viewer ID, the current channel number, the next channel number, the date of the month, respectively. The first column, the ID code of a viewer, ranks in descent with the times the viewer watched tv channels. The more times a viewer watches tv channels, the bigger the ID is. In a day, the rows are time series and generated step by step as the real watching tv behavior.