Datasets
Standard Dataset
Expanded-SauDiSenti Lexicon + Corpus for Food Delivery Domain
- Citation Author(s):
- Submitted by:
- Nora Almezeini
- Last updated:
- Tue, 06/25/2024 - 09:17
- DOI:
- 10.21227/095d-z653
- License:
- Categories:
- Keywords:
Abstract
The major language used on social media platforms is primarily dialectal, posing unique challenges for Natural Language Processing. To address this, a large, manually annotated corpus of approximately 30,500 Saudi dialect tweets in the food delivery app domain was introduced. The corpus was annotated with positive, negative, and neutral sentiment categories. Additionally, the existing SauDiSenti lexicon was expanded by 30%, providing an improved resource for sentiment analysis in the Saudi dialect. the corpus and expanded lexicon have been evaluated using machine learning classifiers. This high-quality, domain-specific dataset and the expanded sentiment lexicon are expected to significantly advance Arabic sentiment analysis, particularly in the Saudi dialect and the food delivery industry.
This repository contains two key resources for advancing Arabic sentiment analysis, particularly in the context of the Saudi dialect and the food delivery industry:
- Expanded SauDiSenti Lexicon : An expanded version of the existing SauDiSenti lexicon, which provides sentiment annotations for words in the Saudi dialect.
- Saudi Dialect Food Delivery Corpus: A large, manually annotated corpus of approximately 30,500 tweets in the Saudi dialect, focused on the food delivery app domain.
The expanded SauDiSenti lexicon is available in the following file format:
- Positive_Expanded.xlsx: an Excel spreadsheet containing the positive words and phrases.
- Negative_Expanded.xlsx: an Excel spreadsheet containing the negative words and phrases.
The Saudi Dialect Food Delivery Corpus is available in the following file format:
- Data.xlsx: an Excel spreadsheet containing the annotated tweets.
Dataset Files
- Expanded Negative Lexicon Negative-ŮExpanded.xlsx (70.98 kB)
- Expanded Positive Lexicon Positive - Expanded.xlsx (32.76 kB)
- Corpus for food delivery domain Data.xlsx (1.27 MB)