facebooktwittermailshare

DCASE2016: Sound event detection in real life audio

Warning message

You must login to view this form.
02/01/2017
Abstract: 

This task evaluates performance of the sound event detection systems in multisource conditions similar to our everyday life, where the sound sources are rarely heard in isolation. Contrary to task 2, there is no control over the number of overlapping sound events at each time, not in the training nor in the testing audio data.

Audio dataset

TUT Sound events 2016 dataset will be used for task 3. Audio in the dataset is a subset of TUT Acoustic scenes 2016dataset (used for task 1). The TUT Sound events 2016 dataset consisting of recordings from two acoustic scenes: 

  • Home (indoor) 
  • Residential area (outdoor). 

These acoustic scenes were selected to represent common environments of interest in applications for safety and surveillance (outside home) and human activity monitoring or home surveillance. 

The dataset was collected in Finland by Tampere University of Technology between 06/2015 - 01/2016. The data collection has received funding from the European Research Council.

Recording and annotation procedure

The recordings were captured each in a different location: different streets, different homes. For each recording location, 3-5 minute long audio recording was captured. The equipment used for recording consists of a binaural Soundman OKM II Klassik/studio A3 electret in-ear microphone and a Roland Edirol R-09 wave recorder using 44.1 kHz sampling rate and 24 bit resolution. For audio material recorded in private places, written consent was obtained from all people involved. 

Individual sound events in each recording were annotated by two research assistants using freely chosen labels for sounds. Nouns were used to characterize each sound source, and verbs the sound production mechanism, whenever this was possible. Annotators were trained first on few example recordings. They were instructed to annotate all audible sound events, decide the start time and end time of the sounds as they see fit, and choose event labels freely. This resulted in a large set of raw labels. There was no verification of the annotations and no evaluation of annotator inter-annotator agreement due to the high level of subjectivity inherent to the problem. 

Target sound event classes were selected based on the frequency of the obtained labels, to ensure that the selected sounds are common for an acoustic scene, and there are sufficient examples for learning acoustic models. Mapping of the raw labels was performed, merging for example "car engine running" to "engine running", and grouping various impact sounds with only verb description such as "banging", "clacking" into "object impact". 

Selected sound event classes:

Home

  • (object) Rustling
  • (object) Snapping
  • Cupboard
  • Cutlery
  • Dishes
  • Drawer
  • Glass jingling
  • Object impact
  • People walking
  • Washing dishes
  • Water tap running

 

Residential area

  • (object) Banging
  • Bird singing
  • Car passing by
  • Children shouting
  • People speaking
  • People walking
  • Wind blowing

For residential area, the sound event classes are mostly related to concrete physical sound sources - bird singing, car passing by. Home scenes are dominated by abstract object impact sounds, besides some more well defined sound events (still impact) like dishes, cutlery, etc.


The submission period for this data competition has ended.

Dataset Files

You must be an approved participant in this data competition to access dataset files. To request access you must first login.

Login

Dataset Details

Citation Author(s):
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen
Submitted by:
Alexander Outman
Last updated:
Tue, 01/10/2017 - 15:56
DOI:
10.21227/H2Z595
Data Format:
Links:
 
Cite

Subscribe

[1] Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen, "DCASE2016: Sound event detection in real life audio", IEEE Dataport, 2016. [Online]. Available: http://dx.doi.org/10.21227/H2Z595. Accessed: Nov. 18, 2017.
@data{h2z595-16,
doi = {10.21227/H2Z595},
url = {http://dx.doi.org/10.21227/H2Z595},
author = {Annamaria Mesaros; Toni Heittola; and Tuomas Virtanen },
publisher = {IEEE Dataport},
title = {DCASE2016: Sound event detection in real life audio},
year = {2016} }
TY - DATA
T1 - DCASE2016: Sound event detection in real life audio
AU - Annamaria Mesaros; Toni Heittola; and Tuomas Virtanen
PY - 2016
PB - IEEE Dataport
UR - 10.21227/H2Z595
ER -
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. (2016). DCASE2016: Sound event detection in real life audio. IEEE Dataport. http://dx.doi.org/10.21227/H2Z595
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen, 2016. DCASE2016: Sound event detection in real life audio. Available at: http://dx.doi.org/10.21227/H2Z595.
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. (2016). "DCASE2016: Sound event detection in real life audio." Web.
1. Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. DCASE2016: Sound event detection in real life audio [Internet]. IEEE Dataport; 2016. Available from : http://dx.doi.org/10.21227/H2Z595
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. "DCASE2016: Sound event detection in real life audio." doi: 10.21227/H2Z595