Datasets
Open Access
CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning
- Citation Author(s):
- Submitted by:
- Jerome Abdelnour
- Last updated:
- Fri, 08/19/2022 - 19:51
- DOI:
- 10.21227/7x26-a025
- Links:
- License:
- Creative Commons Attribution
- Categories:
- Keywords:
Abstract
We introduced the task of acoustic question answering (AQA) in https://arxiv.org/abs/1811.10561.
A second version of the dataset was introduced in https://arxiv.org/abs/2106.06147
This dataset aim to promote research in the acoustic reasoning area.
It comprise Acoustic Scenes and multiple questions/answers for each of them.
Each question is accompanied by a functional program which describe the reasoning steps needed in order to answer it.
The dataset is constitued is separated in 3 sets :
-
Training
-
35 000 acoustic scenes
-
1 400 000 questions/answers
-
Validation
-
7 500 acoustic scenes
-
300 000 questions/answers
-
Test
-
7 500 acoustic scenes
-
300 000 questions/answers
The generation code is available at https://github.com/NECOTIS/CLEAR-AQA-Dataset-Generator
The dataset can be easily regenerated with a different amount of scene/questions/answers.
File Structure
- /audio : Audio recordings of the scenes
- /test : Test set recordings
- /train : Training set recordings
- /val : Validation set recordings
- /questions : Questions with their corresponding answers (3 JSON files, one for each set)
- /scenes : Scenes defintions (3 JSON files, one for each set)
- /arguments : A copy of all the arguments used as input at generation time (For reproducability)
- /logs : Logs of the generation scripts
Scenes
Each scenes is an assembly of 10 Elementary sounds.
The scenes are persisted as JSON blobs.
They contains the following attributes :
- scene_index : Numerical identifier of the scene
- objects : List of elementary sounds contained in the scene (See Elementary Sounds section)
- relationships : Define the relationships between all the objects of the scene
Elementary Sounds
Elementary sounds are recordings of instruments playing a single note.
The Elementary sound bank contains 56 unique recordings separated across 5 instruments family.
Each of them have the following attributes :
- Brightness : {Bright, Dark, Null}
- Duration : length of the sound (in ms)
- Filename : Filename of the audio recording
- Note : Musical note on the chromatic scale { A, A#, B, C, C#, D, D#, E, F, F#, G, G# }
- ID : Numerical identifier of the sound
- Instrument : Name of the instrument playing the sound { Cello, Clarinet, Flute, Trumpet, Violin }
- Loudness : {Loud, Quiet}
- Octave : Octave of the sound
Dataset Files
- CLEAR V1.0.0 CLEAR_v1.0.0.tar.gz (184.79 GB)
- CLEAR V2.0.0 CLEAR2.tar.gz (41.42 GB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.