Tamil Cyberbullying Dataset

Citation Author(s):: Jothi Prakash V

Arul Antran Vijay S
Submitted by:: Arul Antran Vijay S
Last updated:: Tue, 01/07/2025 - 05:50
DOI:: 10.21227/20s2-jh36
Data Format:: *.csv

530 views

Categories:

Keywords:

data mining; sentiment analysis; hate speech detection; anxiety or stress analysis; machine learning; natural language processing;

ACCESS DATASET CITE

Abstract

We used the broad group of 47,692 tweets from the Cyberbullying Classification dataset. This worldwide sourced dataset offers a broad range of examples of cyberbullying, guaranteeing a thorough viewpoint. Our thorough translation and modification procedure guaranteed the dataset's contextual and cultural relevance for the Tamil-speaking population, even though it is not solely from South Asia. These tweets were carefully divided into six classes, each of which represented a different facet of cyberbullying, as well as cases that weren't considered cyberbullying. Because the sample was evenly distributed throughout all categories, it offered a thorough understanding of the complex nature of cyberbullying in online communication.

Instructions:

Dataset

The dataset tamilCB_dataset.csv contains Tamil tweets labeled for cyberbullying. The data includes columns for raw text, labels for cyberbullying, and additional embeddings. The dataset is not preprocessed.

Total Rows: 47,694
Columns:
- tweet_text
- cyberbullying_type
- Tamil Tweet
- Emotion Label (English)
- Emotion Label (Tamil)
- Bullying Type (Tamil)
Data Types:
- All columns are strings (Object).
Missing Values per Column:
- tweet_text: 0
- cyberbullying_type: 0
- Tamil Tweet: 42 missing values
- Emotion Label (English): 2 missing values
- Emotion Label (Tamil): 4 missing values
- Bullying Type (Tamil): 4 missing values

tweet_text	cyberbullying_type	Tamil Tweet	Emotion Label (English)	Emotion Label (Tamil)	Bullying Type (Tamil)
In other words #katandandre, your food was crapilicious! #mkr	not_cyberbullying	வேறு வார்த்தைகளில் கூறுவதானால், #கடந்தந்த்ரே, உங்கள் உணவு மோசமானது! #mkr	Others	மற்றவை	இணைய மிரட்டல் அல்ல
@XochitlSuckkks a classy whore? Or more red velvet cupcakes?	sexual_harassment	@XochitlSuckkks ஒரு கம்பீரமான வேசியா? அல்லது அதிக சிவப்பு வெல்வெட் கப்கேக்குகளா?	Others	மற்றவை	பாலியல் தொல்லை
@RudhoeEnglish This is an ISIS account pretending to be a Kurdish account. Like Islam, it is all lies.	religious_attack	@RudhoeEnglish இது குர்திஷ் கணக்கு போல பாசாங்கு செய்யும் ISIS கணக்கு. இஸ்லாம் போல் இதுவும் பொய்.	Anger	கோபம்	மத அடிப்படையில் தாக்குதல்
@Jason_Gio meh. :P thanks for the heads up, but not too concerned about another angry dude on twitter.	not_cyberbullying	@ஜேசன்_ஜியோ மெஹ். :P தலையை உயர்த்தியதற்கு நன்றி, ஆனால் ட்விட்டரில் மற்றொரு கோபமான நண்பரைப் பற்றி அதிகம் கவலைப்படவில்லை.	Anger	கோபம்	இணைய மிரட்டல் அல்ல