Emotional Crowd Sound
Crowds express emotions as a collective individual, which is evident from the sounds that a crowd produces in particular events, e.g., collective booing, laughing or cheering in sports matches, movies, theaters, concerts, political demonstrations, and riots. Crowd sounds can be characterized by frequency-amplitude features, using analysis techniques similar to those applied on individual voices, where deep learning classification is applied to spectrogram images derived by sound transformations.We present the first dataset of data to apply a technique based on the generation of sound spectrograms from fragments of fixed length, extracted from original audio clips recorded in high-attendance events, where the crowd acts as a collective individual: Transfer learning techniques can be used on a neural network, novel or pre-trained on low-level features using extensive datasets of visual knowledge.The original sound clips are filtered and normalized in amplitude for a correct spectrogram generation, on which to fine-tune the domain-specific features.This dataset includes the complete data of the study, to reproduce each step.
Files in the dataset:
step0 original files:
We normalized the loudness of the dataset to −23 Loudness Units, following the EBU R128 standard.
We filtered the sound in 20–20,000 Hz range.
step2 sound blocks:
We divided the sound files in blocks with the following characteristics:
1s blocks length
0.25s shifting window
We removed 37 silence blocks
step3 spectrogram images:
The blocks of the three emotional classes have been transformed to spectrogram images in four frequency scales:
bark (0-3.5 kHz)
erb (2-4 kHz)
log (0.02-2 kHz)
mel (4-6 kHz)
Per each scale:
Spectrograms have been generated using the spgrambw draw spectrogram function.
We used the Jet colormap of 64 colors, generating png images using a 400 samples hamming-window, frame increment of 4.5 millisecond.
step4 train and test spectrograms:
Extract locally the zip files, read the readme file.
Instructions for dataset usage are included in the open access paper: Franzoni, V., Biondi, G., Milani, A., Emotional sounds of crowds: spectrogram-based analysis using deep learning (2020) Multimedia Tools and Applications, 79 (47-48), pp. 36063-36075. https://doi.org/10.1007/s11042-020-09428-x
File are released under Creative Commons Attribution-ShareAlike 4.0 International License