Multilingual Cyberbullying detection

Dataset for Cyberbullying detection in Mixed Urdu, Roman Urdu, and English Social Media Conversations

The dataset crafted for this study is intentionally designed to encapsulate instances of cyberbullying across three distinct languages: Urdu, Roman Urdu, and English. This strategic selection aims to mirror the linguistic variations that are prevalent in social media dialogues among Urdu-speaking communities globally. Further, it undergoes meticulous annotation to encapsulate the diverse linguistic nuances characteristic of these languages. This process includes integrating critical aspects of cyberbullying, such as aggression, repetition, and intent to harm.

Categories:: Artificial Intelligence

443 Views