Datasets
Standard Dataset
Saudi Dialect Twitter Corpus (SDTwittC)
- Citation Author(s):
- Submitted by:
- Saad Alanazi
- Last updated:
- Tue, 05/17/2022 - 22:17
- DOI:
- 10.21227/cjw6-rm59
- Data Format:
- Research Article Link:
- License:
869 Views
- Categories:
- Keywords:
0 ratings - Please login to submit your rating.
Abstract
SDTwittC consists of 200 authors evenly balanced by gender (100 for each). We identified the gender of the tweeters via their names and profile pictures. As potential copy-and-paste texts, both tweets and retweets are discarded in the first place. Only replies are compiled. The number of replies for each author varies from hundreds to thousands. Male authors produced 233926 replies whereas 219740 replies are generated by the female group
Instructions:
SDTwittC consists of 200 authors evenly balanced by gender (100 for each). Therefore, there are two folders (Final_male and Final_femal). Each folder contains 100 txt file. Each file consists of thousnds of replies for a single and unkonw twitter user.
You can open these files directly in Notepad.
Comments
i need this dataset for learning
May i access this dataset for learning ? I'm a Bachelor of Artificial Intelligence student at the University of Jeddah