Gender Recognition from Voice

Citation Author(s):
Avishek
Garain
Jadavpur University
Submitted by:
Avishek Garain
Last updated:
Sat, 04/25/2020 - 08:58
DOI:
10.21227/v62v-g267
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Abstract: 

Each voice sample is stored as a .WAV file, which is then pre-processed for acoustic analysis using the specan function from the WarbleR R package. Specan measures 22 acoustic parameters on acoustic signals for which the start and end times are provided.

The output from the pre-processed WAV files were saved into a CSV file, containing 3168 rows and 21 columns (20 columns for each feature and one label column for the classification of male or female).

ACOUSTIC PROPERTIES MEASURED

The following acoustic properties of each voice are measured:

  • duration: length of signal
  • meanfreq: mean frequency (in kHz)
  • sd: standard deviation of frequency
  • median: median frequency (in kHz)
  • Q25: first quantile (in kHz)
  • Q75: third quantile (in kHz)
  • IQR: interquantile range (in kHz)
  • skew: skewness (see note in specprop description)
  • kurt: kurtosis (see note in specprop description)
  • sp.ent: spectral entropy
  • sfm: spectral flatness
  • mode: mode frequency
  • centroid: frequency centroid (see specprop)
  • peakf: peak frequency (frequency with highest energy)
  • meanfun: average of fundamental frequency measured across acoustic signal
  • minfun: minimum fundamental frequency measured across acoustic signal
  • maxfun: maximum fundamental frequency measured across acoustic signal
  • meandom: average of dominant frequency measured across acoustic signal
  • mindom: minimum of dominant frequency measured across acoustic signal
  • maxdom: maximum of dominant frequency measured across acoustic signal
  • dfrange: range of dominant frequency measured across acoustic signal
  • modindx: modulation index. Calculated as the accumulated absolute difference between adjacent measurements of fundamental frequencies divided by the frequency range

Note, the features for duration and peak frequency (peakf) were removed from training. Duration refers to the length of the recording, which for training, is cut off at 20 seconds. Peakf was omitted from calculation due to time and CPU constraints in calculating the value. In this case, all records will have the same value for duration (20) and peak frequency (0).

Instructions: 

It contains the description of the data.

Dataset Files

ACCESS ON AWS

Documentation

AttachmentSize
Plain text icon Read me2.04 KB
s