*.wav; *.csv

This dataset contains audio recordings and transcriptions of toxic speech derived from Indonesian conversations during YouTube videos where scammers are confronted. The dataset captures two separate interactions that escalate into toxic exchanges. Each interaction has been verified by native Indonesian speakers and labeled into two classes: toxic and non-toxic. The dataset includes both the original and preprocessed versions of the speech and text data. The original speech files total 136MB, while the preprocessed speech files are 111,7MB.

Categories:
216 Views

A curated dataset containing underwater acoustic signals categorized into five different classes based on the vessel type: Cargo, Tanker, Tug, Passengership, and Background. Different subsets of data were generated from the original data considering the distance from the vessel to the hydrophone picking up the vessel's sound.

Categories:
3656 Views