Multilabel Extremism Classification Tweets Dataset

Citation Author(s):
Mahamodul Hasan
Mahadi
Md. Nasif
Safwan
Submitted by:
Mahamodul Hasan...
Last updated:
Fri, 08/30/2024 - 09:22
DOI:
10.21227/rxj1-hm02
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The "Multilabel Extremism Classification Tweets Dataset" dataset contains user comments annotated with labels including toxic, severe toxic, obscene, threat, insult, and identity hate. Designed for multi-label classification, this dataset is valuable for researchers focused on detecting online extremism and toxicity across multiple languages. It enables the development of NLP models for content moderation, hate speech detection, and extremism identification. By providing diverse examples of harmful online behavior, the dataset supports the creation of robust models capable of recognizing and categorizing different forms of extremism in various contexts.

Instructions: 

The dataset is structured in a tabular format with the following columns:

  • id: Unique identifier for each comment.
  • comment: The text of the user-generated comment.
  • toxic: Binary label indicating if the comment is toxic (1) or not (0).
  • severe_toxic: Binary label indicating if the comment is severely toxic (1) or not (0).
  • obscene: Binary label indicating if the comment is obscene (1) or not (0).
  • threat: Binary label indicating if the comment contains a threat (1) or not (0).
  • insult: Binary label indicating if the comment contains an insult (1) or not (0).
  • identity_hate: Binary label indicating if the comment contains identity-based hate (1) or not (0).