Agustinus Bimo Gumelar

This dataset contains audio recordings and transcriptions of toxic speech derived from Indonesian conversations during YouTube videos where scammers are confronted. The dataset captures two separate interactions that escalate into toxic exchanges. Each interaction has been verified by native Indonesian speakers and labeled into two classes: toxic and non-toxic. The dataset includes both the original and preprocessed versions of the speech and text data. The original speech files total 136MB, while the preprocessed speech files are 111,7MB.

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Subscribe now or login.

Documentation: 
AttachmentSize
File Dataset Overview - IndoToxSpeech.docx16.54 KB
[1] Agustinus Bimo Gumelar, Eko Mulyanto Yuniarno, Arif Nugroho, Derry Pramono Adi, Indar Sugiarto, Mauridhi Hery Purnomo, "Indonesian Toxic Speech Dataset (IndoToxSpeech)", IEEE Dataport, 2024. [Online]. Available: http://dx.doi.org/10.21227/dbgb-j630. Accessed: Dec. 26, 2024.
@data{dbgb-j630-24,
doi = {10.21227/dbgb-j630},
url = {http://dx.doi.org/10.21227/dbgb-j630},
author = {Agustinus Bimo Gumelar; Eko Mulyanto Yuniarno; Arif Nugroho; Derry Pramono Adi; Indar Sugiarto; Mauridhi Hery Purnomo },
publisher = {IEEE Dataport},
title = {Indonesian Toxic Speech Dataset (IndoToxSpeech)},
year = {2024} }
TY - DATA
T1 - Indonesian Toxic Speech Dataset (IndoToxSpeech)
AU - Agustinus Bimo Gumelar; Eko Mulyanto Yuniarno; Arif Nugroho; Derry Pramono Adi; Indar Sugiarto; Mauridhi Hery Purnomo
PY - 2024
PB - IEEE Dataport
UR - 10.21227/dbgb-j630
ER -
Agustinus Bimo Gumelar, Eko Mulyanto Yuniarno, Arif Nugroho, Derry Pramono Adi, Indar Sugiarto, Mauridhi Hery Purnomo. (2024). Indonesian Toxic Speech Dataset (IndoToxSpeech). IEEE Dataport. http://dx.doi.org/10.21227/dbgb-j630
Agustinus Bimo Gumelar, Eko Mulyanto Yuniarno, Arif Nugroho, Derry Pramono Adi, Indar Sugiarto, Mauridhi Hery Purnomo, 2024. Indonesian Toxic Speech Dataset (IndoToxSpeech). Available at: http://dx.doi.org/10.21227/dbgb-j630.
Agustinus Bimo Gumelar, Eko Mulyanto Yuniarno, Arif Nugroho, Derry Pramono Adi, Indar Sugiarto, Mauridhi Hery Purnomo. (2024). "Indonesian Toxic Speech Dataset (IndoToxSpeech)." Web.
1. Agustinus Bimo Gumelar, Eko Mulyanto Yuniarno, Arif Nugroho, Derry Pramono Adi, Indar Sugiarto, Mauridhi Hery Purnomo. Indonesian Toxic Speech Dataset (IndoToxSpeech) [Internet]. IEEE Dataport; 2024. Available from : http://dx.doi.org/10.21227/dbgb-j630
Agustinus Bimo Gumelar, Eko Mulyanto Yuniarno, Arif Nugroho, Derry Pramono Adi, Indar Sugiarto, Mauridhi Hery Purnomo. "Indonesian Toxic Speech Dataset (IndoToxSpeech)." doi: 10.21227/dbgb-j630