Datasets
Standard Dataset
Multi-Label Extremism and Jihadism Classification Tweets Dataset
- Citation Author(s):
- Submitted by:
- Mahamodul Hasan...
- Last updated:
- Fri, 08/30/2024 - 08:47
- DOI:
- 10.21227/6gmh-1b80
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
The "Multi-Label Extremism and Jihadism Classification Tweets Dataset" dataset is a multilingual resource designed for multi-label classification of online extremism and toxic behavior, including extremism and jihadism. Each comment is annotated with labels indicating the presence of various extremism traits: toxic, severe toxic, obscenity, threats, insults, identity hate, and jihadi content. This dataset is valuable for research in automated content moderation, enabling the detection of harmful and extremist content across multiple languages, and contributing to the development of safer online environments by providing a diverse array of real-world examples.
Files
- Terrorism and Multi Toxic labels Classification.csv: The primary dataset file containing the comments and their corresponding labels.
Columns
id
: A unique identifier for each comment.comment_text
: The raw text of the comment.toxic
: Binary label (0 or 1) indicating the presence of general toxicity.severe_toxic
: Binary label (0 or 1) indicating the presence of severe toxicity.obscene
: Binary label (0 or 1) indicating the presence of obscenity.threat
: Binary label (0 or 1) indicating the presence of threats.insult
: Binary label (0 or 1) indicating the presence of insults.identity_hate
: Binary label (0 or 1) indicating the presence of identity hate.jihadi
: Binary label (0 or 1) indicating the presence of jihadist content.
Labels
Each comment is annotated with multiple binary labels that indicate the presence (1) or absence (0) of the following traits:
- Toxic: General harmful language.
- Severe Toxic: Extremely harmful or aggressive language.
- Obscene: Language that is offensive or vulgar.
- Threat: Language that expresses intent to harm.
- Insult: Language intended to offend or demean.
- Identity Hate: Language that targets a person or group based on their identity.
- Jihadi: Content associated with jihadism or extremist ideologies.
Applications
This dataset can be used for various tasks, including but not limited to:
- Multi-label classification: Identifying multiple forms of extremism and toxicity in a single comment.
- Extremism detection: Developing models that can detect online extremism, including extremism and jihadism.
- Content moderation: Training models to assist in automated content moderation systems.
Comments
The "Terrorism and Multi-Toxic Labels Classification" dataset is a multilingual dataset curated to assist in the development and evaluation of models aimed at detecting online extremism and toxic behaviors. This dataset is particularly suited for tasks involving multi-label classification, where each comment may exhibit multiple forms of extremism and toxicity.