Expanded-SauDiSenti Lexicon + Corpus for Food Delivery Domain

Citation Author(s):
Nora
Almezeini
Nora
Alkhamees
Monira
Aloud
Dina
Binjabi
Submitted by:
Nora Almezeini
Last updated:
Tue, 06/25/2024 - 09:17
DOI:
10.21227/095d-z653
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The major language used on social media platforms is primarily dialectal, posing unique challenges for Natural Language Processing. To address this, a large, manually annotated corpus of approximately 30,500 Saudi dialect tweets in the food delivery app domain was introduced. The corpus was annotated with positive, negative, and neutral sentiment categories. Additionally, the existing SauDiSenti lexicon was expanded by 30%, providing an improved resource for sentiment analysis in the Saudi dialect. the corpus and expanded lexicon have been evaluated using machine learning classifiers. This high-quality, domain-specific dataset and the expanded sentiment lexicon are expected to significantly advance Arabic sentiment analysis, particularly in the Saudi dialect and the food delivery industry.

Instructions: 

This repository contains two key resources for advancing Arabic sentiment analysis, particularly in the context of the Saudi dialect and the food delivery industry:

  1. Expanded SauDiSenti Lexicon : An expanded version of the existing SauDiSenti lexicon, which provides sentiment annotations for words in the Saudi dialect.
  2. Saudi Dialect Food Delivery Corpus: A large, manually annotated corpus of approximately 30,500 tweets in the Saudi dialect, focused on the food delivery app domain.

The expanded SauDiSenti lexicon is available in the following file format:

  • Positive_Expanded.xlsx: an Excel spreadsheet containing the positive words and phrases.
  • Negative_Expanded.xlsx: an Excel spreadsheet containing the negative words and phrases.

The Saudi Dialect Food Delivery Corpus is available in the following file format:

  • Data.xlsx: an Excel spreadsheet containing the annotated tweets.