CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning

Citation Author(s):
Universite de Sherbrooke
KTH Royal Institute of Technology
Universite de Sherbrooke
Submitted by:
Jerome Abdelnour
Last updated:
Mon, 02/25/2019 - 16:40
Creative Commons Attribution
0 ratings - Please login to submit your rating.


We introduced the task of acoustic question answering (AQA) in

This dataset aim to promote research in the acoustic reasoning area.

It comprise Acoustic Scenes and multiple questions/answers for each of them.

Each question is accompanied by a functional program which describe the reasoning steps needed in order to answer it.


The dataset is constitued is separated in 3 sets :

    • Training
      • 35 000 acoustic scenes
      • 1 400 000 questions/answers
    • Validation
      • 7 500 acoustic scenes
      • 300 000 questions/answers
    • Test
      • 7 500 acoustic scenes
      • 300 000 questions/answers


The generation code is available at

The dataset can be easily regenerated with a different amount of scene/questions/answers.


File Structure

    • /audio : Audio recordings of the scenes
      • /test : Test set recordings
      • /train : Training set recordings
      • /val : Validation set recordings
    • /questions : Questions with their corresponding answers (3 JSON files, one for each set)
    • /scenes : Scenes defintions (3 JSON files, one for each set)
    • /arguments : A copy of all the arguments used as input at generation time (For reproducability)
    • /logs : Logs of the generation scripts



Each scenes is an assembly of 10 Elementary sounds.The scenes are persisted as JSON blobs. They contains the following attributes :

    • scene_index : Numerical identifier of the scene
    • objects : List of elementary sounds contained in the scene (See Elementary Sounds section)
    • relationships : Define the relationships between all the objects of the scene

Elementary Sounds

Elementary sounds are recordings of instruments playing a single note.The Elementary sound bank contains 56 unique recordings separated across 5 instruments family.Each of them have the following attributes :

    • Brightness : {Bright, Dark, Null}
    • Duration : length of the sound (in ms)
    • Filename : Filename of the audio recording
    • Note : Musical note on the chromatic scale { A, A#, B, C, C#, D, D#, E, F, F#, G, G# }
    • ID : Numerical identifier of the sound
    • Instrument : Name of the instrument playing the sound { Cello, Clarinet, Flute, Trumpet, Violin }
    • Loudness : {Loud, Quiet}
    • Octave : Octave of the sound