Tamil Cyberbullying Dataset

0
0 ratings - Please login to submit your rating.

Abstract 

We used the broad group of 47,692 tweets from the Cyberbullying Classification dataset. This worldwide sourced dataset offers a broad range of examples of cyberbullying, guaranteeing a thorough viewpoint. Our thorough translation and modification procedure guaranteed the dataset's contextual and cultural relevance for the Tamil-speaking population, even though it is not solely from South Asia. These tweets were carefully divided into six classes, each of which represented a different facet of cyberbullying, as well as cases that weren't considered cyberbullying. Because the sample was evenly distributed throughout all categories, it offered a thorough understanding of the complex nature of cyberbullying in online communication. 

 

Instructions: 

Dataset


The dataset tamilCB_dataset.csvcontains Tamil tweets labeled for cyberbullying. The data includes columns for raw text, labels for cyberbullying, and additional embeddings. The dataset is not preprocessed.