Bangla Social Media Cyberbullying Dataset

Citation Author(s):
Methela
Farjana
Habiba
Tushi
Farjana
Ferdosy
Submitted by:
Methela Farjana
Last updated:
Sat, 03/29/2025 - 09:35
DOI:
10.21227/c9sp-7q60
Data Format:
License:
126 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

Cyberbullying is a growing problem on social media. This dataset helps detect cyberbullying in Bangla by collecting comments from YouTube, Facebook, Instagram, and TikTok. The data is categorized into two types: bullying and non-bullying. It includes various abusive and harmful texts, along with normal conversations. This dataset will help researchers and developers train AI models to automatically identify cyberbullying in Bangla text. The goal is to create better tools to keep online spaces safe for Bangla-speaking users.

 

Instructions: 

This dataset is created to help researchers and developers detect cyberbullying in Bangla text. It contains comments collected from popular social media platforms like YouTube, Facebook, Instagram, and TikTok. The dataset is labeled into two categories: Bullying and Non-Bullying, making it useful for training classification models.

 

The dataset is provided in XLSX format. Users can open it using Microsoft Excel, Google Sheets, or Python libraries like pandas for data processing. Each row in the dataset contains a text sample along with its corresponding label. Researchers can preprocess the text by cleaning unnecessary symbols, removing stopwords, and normalizing the text to improve model performance.

 

Once preprocessed, the dataset can be used to train various machine learning and deep learning models. Basic models like Naïve Bayes or SVM can be applied, while more advanced models such as LSTMs or transformer-based architectures like BERT can enhance detection accuracy.

 

After training, users should evaluate model performance using standard NLP metrics like accuracy, precision, recall, and F1-score. This dataset can also be used for linguistic analysis, Bangla text classification, and the development of AI-driven content moderation tools for safer online communication.