CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning

Citation Author(s):
Universite de Sherbrooke
KTH Royal Institute of Technology
Universite de Sherbrooke
Submitted by:
Jerome Abdelnour
Last updated:
Fri, 08/19/2022 - 19:51
Creative Commons Attribution
0 ratings - Please login to submit your rating.


We introduced the task of acoustic question answering (AQA) in

A second version of the dataset was introduced in

This dataset aim to promote research in the acoustic reasoning area.

It comprise Acoustic Scenes and multiple questions/answers for each of them.

Each question is accompanied by a functional program which describe the reasoning steps needed in order to answer it.


The dataset is constitued is separated in 3 sets :

    • Training
      • 35 000 acoustic scenes
      • 1 400 000 questions/answers
    • Validation
      • 7 500 acoustic scenes
      • 300 000 questions/answers
    • Test
      • 7 500 acoustic scenes
      • 300 000 questions/answers


The generation code is available at

The dataset can be easily regenerated with a different amount of scene/questions/answers.


File Structure

    • /audio : Audio recordings of the scenes
      • /test : Test set recordings
      • /train : Training set recordings
      • /val : Validation set recordings
    • /questions : Questions with their corresponding answers (3 JSON files, one for each set)
    • /scenes : Scenes defintions (3 JSON files, one for each set)
    • /arguments : A copy of all the arguments used as input at generation time (For reproducability)
    • /logs : Logs of the generation scripts



Each scenes is an assembly of 10 Elementary sounds.
The scenes are persisted as JSON blobs.

They contains the following attributes :

    • scene_index : Numerical identifier of the scene
    • objects : List of elementary sounds contained in the scene (See Elementary Sounds section)
    • relationships : Define the relationships between all the objects of the scene

Elementary Sounds

Elementary sounds are recordings of instruments playing a single note.
The Elementary sound bank contains 56 unique recordings separated across 5 instruments family.

Each of them have the following attributes :

    • Brightness : {Bright, Dark, Null}
    • Duration : length of the sound (in ms)
    • Filename : Filename of the audio recording
    • Note : Musical note on the chromatic scale { A, A#, B, C, C#, D, D#, E, F, F#, G, G# }
    • ID : Numerical identifier of the sound
    • Instrument : Name of the instrument playing the sound { Cello, Clarinet, Flute, Trumpet, Violin }
    • Loudness : {Loud, Quiet}
    • Octave : Octave of the sound


Dataset Files

Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.