Saudi Dialect Twitter Corpus (SDTwittC)

Citation Author(s):
Saad
Alanazi
Jouf University
Submitted by:
Saad Alanazi
Last updated:
Tue, 05/17/2022 - 22:17
DOI:
10.21227/cjw6-rm59
Data Format:
Research Article Link:
License:
862 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

SDTwittC consists of 200 authors evenly balanced by gender (100 for each). We identified the gender of the tweeters via their names and profile pictures. As potential copy-and-paste texts, both tweets and retweets are discarded in the first place. Only replies are compiled. The number of replies for each author varies from hundreds to thousands. Male authors produced 233926 replies whereas 219740 replies are generated by the female group

Instructions: 

SDTwittC consists of 200 authors evenly balanced by gender (100 for each). Therefore, there are two folders (Final_male and Final_femal). Each folder contains 100 txt file. Each file consists of  thousnds of replies for a single and unkonw twitter user. 

You can open these files directly in Notepad. 

Comments

i need this dataset for learning

Submitted by ashraf fuad on Sat, 03/05/2022 - 13:17

May i access this dataset for learning ? I'm a Bachelor of Artificial Intelligence student at the University of Jeddah

Submitted by Alaa Mohammed on Mon, 04/04/2022 - 14:58