lexicon
The major language used on social media platforms is primarily dialectal, posing unique challenges for Natural Language Processing. To address this, a large, manually annotated corpus of approximately 30,500 Saudi dialect tweets in the food delivery app domain was introduced. The corpus was annotated with positive, negative, and neutral sentiment categories. Additionally, the existing SauDiSenti lexicon was expanded by 30%, providing an improved resource for sentiment analysis in the Saudi dialect. the corpus and expanded lexicon have been evaluated using machine learning classifiers.
- Categories:
Companion data of the paper "Using social media and personality traits to assess software developers’ emotions" submitted to the IEEE Access journal, 2022. This dataset contains the anonymized dataset used in the study, including the answers of demographic survey, the answers to the Big Five Inventory, the experiment protocol, the manual analysis from psychologists and participants, all generated charts and data analysis.
- Categories: