The Mandarin Chinese Auditory Emotions Stimulus Database: A Validated Set of Subject Personal Pronoun Sentences (MCAE-SPPS)

Citation Author(s):: Chao Wu (Peking University)
Submitted by:: Chao Wu
Last updated:: Fri, 01/24/2025 - 10:12
DOI:: 10.21227/32ws-qa57
Links:: The Mandarin Chinese Auditory Emotions Stimulus Database: A Validated Set of Se…

74 views

Categories:

Social Sciences

Keywords:

Subject personal pronouns; speech prosody; emotional speech; emotion recognition; acoustic features; voice

ACCESS DATASET CITE

Abstract

When expressing emotions in daily life, sentences with different personal pronouns as the subject may bring listeners different emotional experiences. This study established and validated an auditory emotional speech dataset consisting of sentences with three types of Mandarin Chinese Personal Pronouns (I and We, singular You and plural You, singular third-person, and They) as the subject, an unprecedented resource. Six professional Mandarin actors (three men and three women) expressed two hundred meaningful Chinese sentences in neutral and six basic emotional tones: happiness, sadness, anger, fear, disgust, and surprise. Seven hundred and twenty Chinese college students evaluated the 8400 recordings for emotional categories (a one out of seven-force choose task) and intensity (a scale of 1-9). The final dataset consists of 7579 valid Chinese subject pronoun sentences (Neutrality: 1608; Sadness: 1324; Anger: 1167; Surprise: 1076; Disgust: 797; Happiness: 855; Fear: 752; First/Second/Thirds personal pronouns sentences: 3085, 2975, and 1519). Recognition rates for the emotions were as follows: neutrality (82%), sadness (80%), anger (78%), surprise (72%), happiness (65%), disgust (62%), and fear (60%). We provided each sound’s raw data, validation results, perceptual intensity rating scores, and acoustic information. This personal pronoun emotional sentence database is valuable for various research areas, including linguistics, psychology, neuroscience, clinical rehabilitation, and computer science. The Chinese subject personal pronouns auditory emotional speech database is at open science framework

Instructions:

-----------------------------Audio files coding rule-------------------
The audio file name consists of six digits：
[1st digit]The audios coding system includes actor coding (1–6): 1~3 are male actors, 4~6 are female actors;

[2nd digits]Emotion type coding (Original recording emotion type): 1: neutral, 2: happiness, 3: anger, 4: fear, 5: sadness, 6: disgust, 7: surprise;
Note: The actual emotion type of audios after validation please refer to [emotioncode] column in excel [sentence corpus information.xlsx]

[3~5th digit] word coding: 001–200, the speech content please refer to [sentencecode] and [sentence content] column in excel [sentence corpus information.xlsx]

Examples:
13171.wav —actor1’s (male) anger voice，speech content is “他有个计划”(He has a plan).

45021.wav —actor4’s (female) sadness voice，speech content is “我拿着杯子”(I hold the cup).

36200.wav —actor3’s (male) fear voice (original recorded as disgust). After validation，it’s actual emotion type is anger (the actual emotion type after validation can be seen in "corpus information.xlsx"). The speech content is “他们没说话”(They didn't speak).

--------------sentence corpus information.xlsx column name illustration-------------------

"Filename" + ".wav" are audio file names, filename coding rule see "Audio files coding rule".

"Emotioncode" is emotion type of each audio after validation.

"Intensity" is mean value of 40 participant's emotional intensity assessment (range 1~9),1 means the selected emotion is very slight, 9 means the selected emotion is very strong.

"Actress" is speaker code, 1~3 are males, 4~6 are females.

"Recognition rate" is the mean accuracy assessed by 40 participates, range 0~1.

"Neutral" ~"Happiness" are the number of participants response to the target emotion type ("Emotion code"). Sum of the "Neutral" ~"Happiness" is 40.

"Original recording emotion type" is the original recording emotion type by speakers. Including "Neutral first tone" to "Neutral forth tone", and six other emotion type. Please refer the neutral tone information from this column.

"sentencecode" is the sententence content code of each audio, which is abstracted from filename, range from 001~200.

"sentence content" is the content of Chinese sentences of each audio, each “sentencecode” corresponds to one "sentence content". Sentence code 1~40 are sentences with subject "I",Sentence code 1~40 are sentences with subject "I",Sentence code 41~80 are sentences with subject "we",Sentence code 81~120 are sentences with subject "you",Sentence code 121~160 are sentences with subject "you"(plural),Sentence code 161~180 are sentences with subject "He/she", sentence code 181~200 are sentences with subject "They".

"subjectname" is the Subject of each audio. "I" means the first person singular, "we" means the first person plural, "You" means the second person singular, "Yous" means the second person plural，“He” means the third person singular (In Chinese pronunciation, both she and he are pronounced as "ta"), "They" means the third person plural.

"Groupnumber" is the group information, range 1~18. As described in our article, each speaker's audios were randomly divided into three parts according to the "sentencecode", each portion has nearly equal numbers of audios with different types of emotion and subject name. There are a total of 18 groups of audios, and the participants were randomly divided into 18 groups, each group assessed about 465 audios.

Acustic information:
"Duration" ~ "spectralSpread" are acoustic information of each audio:

"Duration" is the duration time of audio, the unit is second(s)
"meanF0", "stdevF0", "maxf0","minF0" are the mean, standard deviation, maximum, and minimum of the Pitch F0, respectively. The units are (Hz).
"hnr"is harmonics-to-noise ratio, unit is (dB).
"localJitter" and "localShimmer" are local jitter and local Shimmer, units are (%).
"meanIntensity" is the mean intensity of voice, unit is (dB).
"rmsEnergy" is the root mean square of the energy, unit is (amplitude)
"spectralCOG" and "spectralSpread" are the center of gravity and the spread of the audio spectral, respectively. Units are (Hz).

--------------participant's Hu-score.xlsx column name illustration-----------
This data is the 720 participants' emotion discrimination results.

"Participatecode" is the user name in the emotion recognition website, range 20230001~20230720. A total of 720 users participated in the audio assessment.

"NeuNeu"~ "HappHapp" (7*7=49 columns) are the number of each participant's choice upon different target emotion audios. [Neu: Neutral, ang: anger; dis: disgust; happ, happiness; sad, sadness; surp, surprise; Fear:fear].The column names are target emotion + participants' response. For example "AngSurp" column is the audio number of participants choose anger voices as surprise. The target emotion here is the emotion type after validation, not the original speaker's intended recording emotion type.

"Hu_neu"~"Hu_sad" are the Hu-score of seven emotions.

"actress" is speaker code, 1~3 are males, 4~6 are females.

"actressex" is the gender of speakers.

"listenersex" is the gender of participants.

"groupnumber" is the group information, range 1~18, which has described before (sentence corpus information.xlsx column name illustration)

Funding Agency

This work was supported by The National Natural Science Foundation of China General Project, the General Program of the National Social Science Foundation, and the Beijing Natural Science Foundation

Grant Number

32271138, 21BGL229, 7202086