Datasets
Standard Dataset
Annotated Arabic Extremism Tweets
- Citation Author(s):
- Submitted by:
- Saja Aldera
- Last updated:
- Mon, 10/18/2021 - 17:51
- DOI:
- 10.21227/g9c0-1t21
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
We present an Arabic Twitter dataset for online extremism detection consisting of 89K tweets with associated metadata. The dataset was manually annotated by three experts and achieved a Gwet’s AC1 score of 0.6, indicating substantial inter-annotator agreement. We performed further analysis of the tweet metadata to identify important features. For the extremism dataset, there were 89,816 tweets in total published by 52,929 unique users. Moreover, 50,279 tweets (56%) from 22,858 unique users were labeled as extremist, whereas 39,537 tweets (44%) from 30,911 unique users were labeled as non-extremist. We applied Shannon’s entropy measure to check the dataset’s balance, deriving a result of 0.98, which indicates that the dataset is well balanced.
Important Notes:
> Twitter's content redistribution policy restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers to always pull fresh data. It is because a user might delete a tweet or make his/her profile protected.
> Only the tweet IDs and Annotation are available.
> If you need the full dataset please contact me on: saaldera@ksu.edu.sa
Comments
Thank you
Your welcome
Thank you for sharing, I am looking for the complete dataset to use with my research.
If you could share with me the dataset to test my model please, i've already sent you a request on your email