Annotated Arabic Extremism Tweets

Citation Author(s):
Saja
Aldera
Ahmed
Emam
Muhammad
Al-Qurishi
Majed
Alrubaian
Abdulrahman
Alothaim
Submitted by:
Saja Aldera
Last updated:
Mon, 10/18/2021 - 17:51
DOI:
10.21227/g9c0-1t21
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

We present an Arabic Twitter dataset for online extremism detection consisting of 89K tweets with associated metadata. The dataset was manually annotated by three experts and achieved a Gwet’s AC1 score of 0.6, indicating substantial inter-annotator agreement. We performed further analysis of the tweet metadata to identify important features. For the extremism dataset, there were 89,816 tweets in total published by 52,929 unique users. Moreover, 50,279 tweets (56%) from 22,858 unique users were labeled as extremist, whereas 39,537 tweets (44%) from 30,911 unique users were labeled as non-extremist. We applied Shannon’s entropy measure to check the dataset’s balance, deriving a result of 0.98, which indicates that the dataset is well balanced.

Instructions: 

Important Notes:

> Twitter's content redistribution policy restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers to always pull fresh data. It is because a user might delete a tweet or make his/her profile protected.

> Only the tweet IDs and Annotation are available.

> If you need the full dataset please contact me on: saaldera@ksu.edu.sa 

 

Comments

Thank you

Submitted by Yaser Altalhi on Tue, 10/05/2021 - 05:36

Your welcome

Submitted by Saja Aldera on Mon, 10/18/2021 - 17:52

Thank you for sharing, I am looking for the complete dataset to use with my research.

Submitted by Yanis AZIB on Sun, 11/13/2022 - 07:33

If you could share with me the dataset to test my model please, i've already sent you a request on your email

Submitted by Zerrouki Khadidja on Fri, 11/25/2022 - 15:23