Arabic Sentiment Embeddings

Citation Author(s):: Nora Al-Twairesh (King Saud University)

Hadeel Al-Negheimesh (King Saud University)
Submitted by:: Nora Al-Twairesh
Last updated:: Wed, 05/18/2022 - 02:17
DOI:: 10.21227/aavk-g896
Research Article Link:: Surface and Deep Features Ensemble for Sentiment Analysis of Arabic Tweets

189 views

Categories:

Computational Intelligence

ACCESS DATASET CITE

Abstract

Includes sentiment-specific distributed word representations that have been trained on 10M Arabic tweets that are distantly supervised using positive and negative keywords. As described in the paper [1], we follow Tang’s [2] three neural architectures, which encode the sentiment of a word in addition to its semantic and syntactic representation.

Specifications Table

Subject area	Natural Language Processing
More specific subject area	Arabic Sentiment Embeddings
Type of data	text files
How data was acquired	Training Tang’s [2] models on an Arabic tweets dataset that was independently collected.
Data format	Raw
Data source location	Not applicable
Data accessibility

Value of the data

· May replace hand-engineered features for sentiment classification.

· Can be used for benchmarking other Arabic sentiment embeddings.

· The Arabic sentiment embeddings can be used for other NLP tasks where sentiment is important.

References

N. Al-Twairesh, H. Al-Negheimish, Surface and Deep Features Ensemble for Sentiment Analysis of Arabic Tweets , in submission.
D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, B. Qin, Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification, in: Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. Vol. 1 Long Pap., Association for Computational Linguistics, Baltimore, Maryland, 2014: pp. 1555–1565. http://www.aclweb.org/anthology/P14-1146 (accessed May 18, 2018).

Instructions:

Data

We include three files, each corresponding to one of the models which are described in detail in [1]:

1. embeddings_ASEP.txt: the Arabic Sentiment Embeddings built using the Prediction model.

2. embeddings_ASER.txt: the Arabic Sentiment Embeddings built using the Ranking model.

3. embeddings_ASEH.txt: the Arabic Sentiment Embeddings built using the Hybrid model.

Each of the files contains 212,976 lines, starting with the word in the vocabulary, followed by a space, and then 50 decimal numbers separated by spaces (which represent the word vector).