SDU-Haier-ND (Shandong University-Haier-Noise Detection) is a sound dataset jointly constructed by Shandong University and Haier, which contains the operating sound of the internal air conditioner collected during the product quality inspection. We collected and marked a batch of quality inspection sounds of air conditioners in real production environments to form this data set, including normal sound samples and abnormal sound samples.

Categories:
42 Views

Crowds express emotions as a collective individual, which is evident from the sounds that a crowd produces in particular events, e.g., collective booing, laughing or cheering in sports matches, movies, theaters, concerts, political demonstrations, and riots.

Instructions: 

Extract locally the zip files, read the readme file.

Instructions for dataset usage are included in the open access paper: Franzoni, V., Biondi, G., Milani, A., Emotional sounds of crowds: spectrogram-based analysis using deep learning (2020) Multimedia Tools and Applications, 79 (47-48), pp. 36063-36075. https://doi.org/10.1007/s11042-020-09428-x

File are released under Creative Commons Attribution-ShareAlike 4.0 International License

Categories:
165 Views

We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and 2 male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data. Despite the existence of multi-view color datasets c

Instructions: 

* At this moment, the paper of this dataset is under review. The dataset is going to be fully published along with the publication of the paper, while in the meanwhile, more parts of the dataset will be uploaded.

The dataset includes multi-view RGBD, 3D/2D pose, volumetric (mesh/point-cloud/3D character) and audio data along with metadata for spatiotemporal alignment.

The full dataset is splitted per subject and per activity per modality.

There are also two benchmarking subsets, H4D1 for single-person and H4D2 for two-person sequences, respectively.

The fornats are:

  • mRGBD: *.png
  • 3D/2D poses: *.npy
  • volumetric (mesh/point-cloud/): *.ply
  • 3D character: *.fbx
  • metadata: *.txt, *.json

 

Categories:
622 Views

Noisy speech and ideal binary mask estimates for the SPN-ASI repository.

Instructions: 

The noisy speech and a priori SNR estimates for: https://github.com/anicolson/SPN-ASI.

Categories:
69 Views

The training, validation, and test set used for Deep Xi (https://github.com/anicolson/DeepXi). 

 Training set:

Instructions: 

The directories are pre-configured for Deep Xi, as seen here: https://github.com/anicolson/DeepXi/tree/master/set.

Categories:
1653 Views

This is the noisy-speech test set used in the original Deep Xi paper: https://doi.org/10.1016/j.specom.2019.06.002. The clean speech and noise used to create the noisy-speech set are also available.

Instructions: 

  

The directories are pre-configured for Deep Xi, as seen here: https://github.com/anicolson/DeepXi/tree/master/set.

Categories:
347 Views

Noisy-speech set used to test Deep Xi (https://github.com/anicolson/DeepXi). The clean speech and noise used to create the noisy-speech set are also available. The clean-speech recordings are from Librispeech test-clean (http://www.openslr.org/12/).

Instructions: 

The directories are pre-configured for Deep Xi, as seen here: https://github.com/anicolson/DeepXi/tree/master/set.

Categories:
371 Views

The present database contains records of underwater sounds produced by dolphins of the species Tursiops Truncatos. The dolphins live in Dolphinarium Varna (Varna, Bulgaria) and are a family of five individuals. The records was made in the autumn 2019 by the “SigNautic Lab” crew, using measurement equipment of the world’s leading suppliers Bruel&Kjaer and National Instruments. The database contains two sets of .wav audio lossless uncompressed files – 13 “bursts” and 104 “clicks” with durations of 250 ms to 9 s. All signals are sampled at 250 kHz and are amplitude-normalized.

Instructions: 

Please, read the corresponding Matlab software\Read me.txt and Data Acquisition and Signal Analysis Parameters.pdf files.

Categories:
708 Views