Signal Processing
This data consists of 1000 studio-quality audios and their transcription for Vietnamese northern accent.
Each utterance has a length of 14-18 words and is spoken by a single speaker.
The corpus can be used to create a Vietnamese speech synthesis system. A tutorial also available at https://vais.vn/vi/tai-ve/hts_for_vietnamese.
- Categories:
A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed in [1]. Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California in [2, 3]. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequences in four resolutions (i.e., 1920x1080, 1280x720, 960x540, and 640x 360).
- Categories:
At the intersection of signal processing and information forensics, the Signal Processing Cup 2016 global competition has explored a time-varying location-dependent signature of power grids that can be intrinsically captured in media recordings. This signature is called the Electric Network Frequency (ENF) signals. Throughout the SP Cup 2016 competition, participants were provided with multiple training, practice, and testing datasets that consisted of recordings made in different grids and containing ENF traces.
- Categories:
Several established parameters and metrics have been used to characterize the acoustics of a room. The most important are the Direct-To-Reverberant Ratio (DRR), the Reverberation Time (T60) and the reflection coefficient. The acoustic characteristics of a room based on such parameters can be used to predict the quality and intelligibility of speech signals in that room.
- Categories: