CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning

Citation Author(s):
Jerome
Abdelnour
Universite de Sherbrooke
Giampiero
Salvi
KTH Royal Institute of Technology
Jean
Rouat
Universite de Sherbrooke
Submitted by:
Jerome Abdelnour
Last updated:
Mon, 02/25/2019 - 16:40
DOI:
10.21227/7x26-a025
Links:
License:
Creative Commons Attribution
0
0 ratings - Please login to submit your rating.

Abstract 

We introduced the task of acoustic question answering (AQA) in https://arxiv.org/abs/1811.10561.

This dataset aim to promote research in the acoustic reasoning area.

It comprise Acoustic Scenes and multiple questions/answers for each of them.

Each question is accompanied by a functional program which describe the reasoning steps needed in order to answer it.

 

The dataset is constitued is separated in 3 sets :

    • Training
      • 35 000 acoustic scenes
      • 1 400 000 questions/answers
    • Validation
      • 7 500 acoustic scenes
      • 300 000 questions/answers
    • Test
      • 7 500 acoustic scenes
      • 300 000 questions/answers

 

The generation code is available at https://github.com/IGLU-CHISTERA/CLEAR-dataset-generation

The dataset can be easily regenerated with a different amount of scene/questions/answers.

Instructions: 

File Structure

    • /audio : Audio recordings of the scenes
      • /test : Test set recordings
      • /train : Training set recordings
      • /val : Validation set recordings
    • /questions : Questions with their corresponding answers (3 JSON files, one for each set)
    • /scenes : Scenes defintions (3 JSON files, one for each set)
    • /arguments : A copy of all the arguments used as input at generation time (For reproducability)
    • /logs : Logs of the generation scripts

 

Scenes

Each scenes is an assembly of 10 Elementary sounds.The scenes are persisted as JSON blobs. They contains the following attributes :

    • scene_index : Numerical identifier of the scene
    • objects : List of elementary sounds contained in the scene (See Elementary Sounds section)
    • relationships : Define the relationships between all the objects of the scene

Elementary Sounds

Elementary sounds are recordings of instruments playing a single note.The Elementary sound bank contains 56 unique recordings separated across 5 instruments family.Each of them have the following attributes :

    • Brightness : {Bright, Dark, Null}
    • Duration : length of the sound (in ms)
    • Filename : Filename of the audio recording
    • Note : Musical note on the chromatic scale { A, A#, B, C, C#, D, D#, E, F, F#, G, G# }
    • ID : Numerical identifier of the sound
    • Instrument : Name of the instrument playing the sound { Cello, Clarinet, Flute, Trumpet, Violin }
    • Loudness : {Loud, Quiet}
    • Octave : Octave of the sound