Saudi Dialect Twitter Corpus (SDTwittC)

Saudi Dialect Twitter Corpus (SDTwittC)

Citation Author(s):
Saad
Alanazi
Jouf University
Submitted by:
Saad Alanazi
Last updated:
Sun, 06/02/2019 - 22:52
DOI:
10.21227/cjw6-rm59
Data Format:
License:
Dataset Views:
31
Share / Embed Cite
Abstract: 

SDTwittC consists of 200 authors evenly balanced by gender (100 for each). We identified the gender of the tweeters via their names and profile pictures. As potential copy-and-paste texts, both tweets and retweets are discarded in the first place. Only replies are compiled. The number of replies for each author varies from hundreds to thousands. Male authors produced 233926 replies whereas 219740 replies are generated by the female group

Instructions: 

SDTwittC consists of 200 authors evenly balanced by gender (100 for each). Therefore, there are two folders (Final_male and Final_femal). Each folder contains 100 txt file. Each file consists of  thousnds of replies for a single and unkonw twitter user. 

You can open these files directly in Notepad. 

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Login or subscribe now. Sign up to be a Beta Tester and receive a coupon code for a free subscription to IEEE DataPort!

Embed this dataset on another website

Copy and paste the HTML code below to embed your dataset:

Share via email or social media

Click the buttons below:

facebooktwittermailshare
[1] Saad Alanazi, "Saudi Dialect Twitter Corpus (SDTwittC)", IEEE Dataport, 2019. [Online]. Available: http://dx.doi.org/10.21227/cjw6-rm59. Accessed: Aug. 18, 2019.
@data{cjw6-rm59-19,
doi = {10.21227/cjw6-rm59},
url = {http://dx.doi.org/10.21227/cjw6-rm59},
author = {Saad Alanazi },
publisher = {IEEE Dataport},
title = {Saudi Dialect Twitter Corpus (SDTwittC)},
year = {2019} }
TY - DATA
T1 - Saudi Dialect Twitter Corpus (SDTwittC)
AU - Saad Alanazi
PY - 2019
PB - IEEE Dataport
UR - 10.21227/cjw6-rm59
ER -
Saad Alanazi. (2019). Saudi Dialect Twitter Corpus (SDTwittC). IEEE Dataport. http://dx.doi.org/10.21227/cjw6-rm59
Saad Alanazi, 2019. Saudi Dialect Twitter Corpus (SDTwittC). Available at: http://dx.doi.org/10.21227/cjw6-rm59.
Saad Alanazi. (2019). "Saudi Dialect Twitter Corpus (SDTwittC)." Web.
1. Saad Alanazi. Saudi Dialect Twitter Corpus (SDTwittC) [Internet]. IEEE Dataport; 2019. Available from : http://dx.doi.org/10.21227/cjw6-rm59
Saad Alanazi. "Saudi Dialect Twitter Corpus (SDTwittC)." doi: 10.21227/cjw6-rm59