Datasets
Standard Dataset
HENLO: Human voice Natural Language from On-demand media
- Citation Author(s):
- Submitted by:
- AGUSTINUS GUMELAR
- Last updated:
- Wed, 11/20/2024 - 05:23
- DOI:
- 10.21227/m0w3-nz08
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
The Human voice Natural Language from On-demand media (HENLO) dataset is a high-quality emotional speech dataset created to address the need for representative and realistic data in speech emotion recognition research. Unlike many existing datasets, which rely on simulated emotions performed by untrained speakers or directed participants, HENLO sources its data from professionally produced films and podcasts available on Media On-Demand (MOD). These audio samples feature trained actors employing the Stanislavski method, ensuring authentic emotional expressions that closely resemble real-life scenarios.
The dataset prioritizes realism and quality, leveraging audio from films and podcasts produced by top-tier entertainment companies. Each clip undergoes rigorous mastering and scoring processes to ensure minimal environmental noise, making the dataset ideal for machine learning models requiring clean acoustic signals. This high-quality data enables researchers to extract and analyze features such as pitch, intonation, and rhythm with greater accuracy. Additionally, MOD offers unlimited access to a diverse collection of media, further enriching the dataset with varied emotional contexts.
Contents
The dataset consists of 1,176 audio clips, categorized into four core emotional classes based on the theories of Robert Plutchik and Paul Ekman:
- Angry: 337 clips
- Sad: 293 clips
- Happy: 279 clips
- Fear: 273 clips
All audio is in English and is available in both MP3 and WAV formats to accommodate diverse research and application needs.
File Details
Total Dataset Size: + 272 MB
- Angry: 78 MB
- Sad: 67 MB
- Happy: 61 MB
- Fear: 64 MB
Clip Duration: 5–20 seconds
File Formats: MP3 and WAV
Dataset Usage
Research Applications
This dataset is well-suited for:
- Speech Emotion Recognition: Training and testing models to identify emotions from speech data.
- Deep Learning Applications: Leveraging high-quality audio for advanced machine learning architectures such as CNNs and RNNs.
- Human-Computer Interaction: Enhancing systems like virtual assistants and emotion-aware customer service bots with more responsive and realistic interactions.
- Ethical and Clean Data Analysis: Utilizing audio free from environmental noise and ethical concerns, as the recordings come from publicly available on-demand media.
With its clean, high-quality audio and professional emotional expressions, HENLO stands out as an ideal resource for both academic research and practical application in modern speech emotion recognition.
Loading and Accessing the Data
Speech Data: Audio files can be loaded using standard audio processing libraries in Audacity or in Python, such as librosa or pydub.
Preprocessing Notes
- Speech: The preprocessing steps included noise reduction, sampling rate adjustment (to 48kHz), and silence removal.
Dataset Files
- HENLO (Happy) mp3 + wav henlo-happy.zip (627.59 MB)
- HENLO (Sad) mp3 + wav henlo-sad.zip (662.82 MB)
- HENLO (Fear) mp3 + wav henlo-fear.zip (653.38 MB)
- HENLO (Angry) mp3 + wav henlo-angry.zip (801.46 MB)
Documentation
Attachment | Size |
---|---|
HENLO_Dataset_Overview-2024.docx | 17.62 KB |