Abstract

We introduced the task of acoustic question answering (AQA) in https://arxiv.org/abs/1811.10561.

A second version of the dataset was introduced in https://arxiv.org/abs/2106.06147

This dataset aim to promote research in the acoustic reasoning area.

It comprise Acoustic Scenes and multiple questions/answers for each of them.

Each question is accompanied by a functional program which describe the reasoning steps needed in order to answer it.

The dataset is constitued is separated in 3 sets :

```
Training
```

```
35 000 acoustic scenes
```
```
1 400 000 questions/answers
```

```
Validation
```

```
7 500 acoustic scenes
```
```
300 000 questions/answers
```

```
Test
```

```
7 500 acoustic scenes
```
```
300 000 questions/answers
```

The generation code is available at https://github.com/NECOTIS/CLEAR-AQA-Dataset-Generator

The dataset can be easily regenerated with a different amount of scene/questions/answers.

Instructions:

File Structure

/audio : Audio recordings of the scenes

/test : Test set recordings
/train : Training set recordings
/val : Validation set recordings

/questions : Questions with their corresponding answers (3 JSON files, one for each set)
/scenes : Scenes defintions (3 JSON files, one for each set)
/arguments : A copy of all the arguments used as input at generation time (For reproducability)
/logs : Logs of the generation scripts

Scenes

Each scenes is an assembly of 10 Elementary sounds.
The scenes are persisted as JSON blobs.

They contains the following attributes :

scene_index : Numerical identifier of the scene
objects : List of elementary sounds contained in the scene (See Elementary Sounds section)
relationships : Define the relationships between all the objects of the scene

Elementary Sounds

Elementary sounds are recordings of instruments playing a single note.
The Elementary sound bank contains 56 unique recordings separated across 5 instruments family.

Each of them have the following attributes :

Brightness : {Bright, Dark, Null}
Duration : length of the sound (in ms)
Filename : Filename of the audio recording
Note : Musical note on the chromatic scale { A, A#, B, C, C#, D, D#, E, F, F#, G, G# }
ID : Numerical identifier of the sound
Instrument : Name of the instrument playing the sound { Cello, Clarinet, Flute, Trumpet, Violin }
Loudness : {Loud, Quiet}
Octave : Octave of the sound

Dataset Files

CLEAR V1.0.0 CLEAR_v1.0.0.tar.gz (184.79 GB)
CLEAR V2.0.0 CLEAR2.tar.gz (41.42 GB)

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.

QUESTIONS?

Report a problem with this Dataset

Datasets

Open Access

CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning