SDTwittC consists of 200 authors evenly balanced by gender (100 for each). We identified the gender of the tweeters via their names and profile pictures. As potential copy-and-paste texts, both tweets and retweets are discarded in the first place. Only replies are compiled. The number of replies for each author varies from hundreds to thousands. Male authors produced 233926 replies whereas 219740 replies are generated by the female group

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Subscribe now or login.

[1] Saad Alanazi, "Saudi Dialect Twitter Corpus (SDTwittC)", IEEE Dataport, 2019. [Online]. Available: http://dx.doi.org/10.21227/cjw6-rm59. Accessed: Apr. 25, 2025.
@data{cjw6-rm59-19,
doi = {10.21227/cjw6-rm59},
url = {http://dx.doi.org/10.21227/cjw6-rm59},
author = {Saad Alanazi },
publisher = {IEEE Dataport},
title = {Saudi Dialect Twitter Corpus (SDTwittC)},
year = {2019} }
TY - DATA
T1 - Saudi Dialect Twitter Corpus (SDTwittC)
AU - Saad Alanazi
PY - 2019
PB - IEEE Dataport
UR - 10.21227/cjw6-rm59
ER -
Saad Alanazi. (2019). Saudi Dialect Twitter Corpus (SDTwittC). IEEE Dataport. http://dx.doi.org/10.21227/cjw6-rm59
Saad Alanazi, 2019. Saudi Dialect Twitter Corpus (SDTwittC). Available at: http://dx.doi.org/10.21227/cjw6-rm59.
Saad Alanazi. (2019). "Saudi Dialect Twitter Corpus (SDTwittC)." Web.
1. Saad Alanazi. Saudi Dialect Twitter Corpus (SDTwittC) [Internet]. IEEE Dataport; 2019. Available from : http://dx.doi.org/10.21227/cjw6-rm59
Saad Alanazi. "Saudi Dialect Twitter Corpus (SDTwittC)." doi: 10.21227/cjw6-rm59