Skip to main content

Datasets

Standard Dataset

Saudi Dialect Twitter Corpus (SDTwittC)

Citation Author(s):
Saad Alanazi (Jouf University)
Submitted by:
Saad Alanazi
Last updated:
DOI:
10.21227/cjw6-rm59
Data Format:
Research Article Link:
934 views
Categories:
Keywords:
No Ratings Yet

Abstract

SDTwittC consists of 200 authors evenly balanced by gender (100 for each). We identified the gender of the tweeters via their names and profile pictures. As potential copy-and-paste texts, both tweets and retweets are discarded in the first place. Only replies are compiled. The number of replies for each author varies from hundreds to thousands. Male authors produced 233926 replies whereas 219740 replies are generated by the female group

Instructions:

SDTwittC consists of 200 authors evenly balanced by gender (100 for each). Therefore, there are two folders (Final_male and Final_femal). Each folder contains 100 txt file. Each file consists of  thousnds of replies for a single and unkonw twitter user. 

You can open these files directly in Notepad. 

May i access this dataset for learning ? I'm a Bachelor of Artificial Intelligence student at the University of Jeddah
Alaa Mohammed Mon, 04/04/2022 - 18:58 Permalink