Emotional Crowd Sound
- Citation Author(s):
- Submitted by:
- Valentina Franzoni
- Last updated:
- Thu, 02/25/2021 - 08:03
- Data Format:
Crowds express emotions as a collective individual, which is evident from the sounds that a crowd produces in particular events, e.g., collective booing, laughing or cheering in sports matches, movies, theaters, concerts, political demonstrations, and riots. Crowd sounds can be characterized by frequency-amplitude features, using analysis techniques similar to those applied on individual voices, where deep learning classification is applied to spectrogram images derived by sound transformations.
We present the first dataset of data to apply a technique based on the generation of sound spectrograms from fragments of fixed length, extracted from original audio clips recorded in high-attendance events, where the crowd acts as a collective individual: Transfer learning techniques can be used on a neural network, novel or pre-trained on low-level features using extensive datasets of visual knowledge.
The original sound clips are filtered and normalized in amplitude for a correct spectrogram generation, on which to fine-tune the domain-specific features.
This dataset includes the complete data of the study, to reproduce each step.
Files in the dataset:
step0 original files: Approval 39 Disapproval 14 Neutral 15 step1 normalization: Approval 39 Disapproval 14 Neutral 15 We normalized the loudness of the dataset to −23 Loudness Units, following the EBU R128 standard. We filtered the sound in 20–20,000 Hz range. step2 sound blocks: Approval 1787 Disapproval 388 Neutral 7340 We divided the sound files in blocks with the following characteristics: 1s blocks length 0.25s shifting window 0.75s overlap We removed 37 silence blocks step3 spectrogram images: The blocks of the three emotional classes have been transformed to spectrogram images in four frequency scales: bark (0-3.5 kHz) erb (2-4 kHz) log (0.02-2 kHz) mel (4-6 kHz) Per each scale: Approval 1787 Disapproval 388 Neutral 7340 Spectrograms have been generated using the spgrambw draw spectrogram function. We used the Jet colormap of 64 colors, generating png images using a 400 samples hamming-window, frame increment of 4.5 millisecond. step4 train and test spectrograms: Training: Approval 1429 Disapproval 310 Neutral 5872 Test: Approval 358 Disapproval 78 Neutral 1468
Extract locally the zip files, read the readme file.
Instructions for dataset usage are included in the open access paper:
Franzoni, V., Biondi, G., Milani, A., Emotional sounds of crowds: spectrogram-based analysis using deep learning (2020) Multimedia Tools and Applications, 79 (47-48), pp. 36063-36075. https://doi.org/10.1007/s11042-020-09428-x
File are released under Creative Commons Attribution-ShareAlike 4.0 International License
- IEEE dataport dataset crowd sound.zip (6.32 GB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.