This data consists of 1000 studio-quality audios and their transcription for Vietnamese northern accent. 
Each utterance has a length of 14-18 words and is spoken by a single speaker.
The corpus can be used to create a Vietnamese speech synthesis system. A tutorial also available at https://vais.vn/vi/tai-ve/hts_for_vietnamese.

Categories:
3014 Views

A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed in [1].  Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California in [2, 3]. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequences in four resolutions (i.e., 1920x1080, 1280x720, 960x540, and 640x 360).

Categories:
9031 Views

This task evaluates performance of the sound event detection systems in multisource conditions similar to our everyday life, where the sound sources are rarely heard in isolation. Contrary to task 2, there is no control over the number of overlapping sound events at each time, not in the training nor in the testing audio data.

Last Updated On: 
Tue, 01/10/2017 - 15:56
Citation Author(s): 
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen

The dataset contains depth frames collected using Microsoft Kinect v1 during the execution of food and drink intake movements.

Categories:
223 Views

The dataset contains depth frames collected using Microsoft Kinect v1 during the execution of food and drink intake movements.

Categories:
159 Views

The dataset contains depth frames and skeleton joints collected using Microsoft Kinect v2 and acceleration samples provided by an IMU during the execution of the timed up and go test.

Categories:
990 Views

The dataset contains depth frames and skeleton joints collected using Microsoft Kinect v2 and acceleration samples provided by an IMU during the simulation of ADLs and falls.

Categories:
9857 Views

The dataset contains depth frames collected using Microsoft Kinect v1 in top-view configuration and can be used for fall detection.

Categories:
2176 Views

At the intersection of signal processing and information forensics, the Signal Processing Cup 2016 global competition has explored a time-varying location-dependent signature of power grids that can be intrinsically captured in media recordings. This signature is called the Electric Network Frequency (ENF) signals. Throughout the SP Cup 2016 competition, participants were provided with multiple training, practice, and testing datasets that consisted of recordings made in different grids and containing ENF traces.

Categories:
895 Views

Several established parameters and metrics have been used to characterize the acoustics of a room. The most important are the Direct-To-Reverberant Ratio (DRR), the Reverberation Time (T60) and the reflection coefficient. The acoustic characteristics of a room based on such parameters can be used to predict the quality and intelligibility of speech signals in that room.

Categories:
1394 Views

Pages