Arabic Sentiment Embeddings

Citation Author(s):
Nora
Al-Twairesh
King Saud University
Hadeel
Al-Negheimesh
King Saud University
Submitted by:
Nora Al-Twairesh
Last updated:
Tue, 05/17/2022 - 22:17
DOI:
10.21227/aavk-g896
Research Article Link:
License:
177 Views
Categories:
0
0 ratings - Please login to submit your rating.

Abstract 

Includes sentiment-specific distributed word representations that have been trained on 10M Arabic tweets that are distantly supervised using positive and negative keywords. As described in the paper [1], we follow Tang’s [2] three neural architectures, which encode the sentiment of a word in addition to its semantic and syntactic representation. 

 

Specifications Table

Subject area

 Natural Language Processing

More specific subject area

Arabic Sentiment Embeddings

Type of data

text files

How data was acquired

Training Tang’s [2] models on an Arabic tweets dataset that was independently collected.

Data format

Raw

Data source location

Not applicable

Data accessibility

 

 

Value of the data  

·        May replace hand-engineered features for sentiment classification.

·        Can be used for benchmarking other Arabic sentiment embeddings.

·        The Arabic sentiment embeddings can be used for other NLP tasks where sentiment is important.

References

  1. N. Al-Twairesh, H. Al-Negheimish, Surface and Deep Features Ensemble for Sentiment Analysis of Arabic Tweets , in submission.
  2. D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, B. Qin, Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification, in: Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. Vol. 1 Long Pap., Association for Computational Linguistics, Baltimore, Maryland, 2014: pp. 1555–1565. http://www.aclweb.org/anthology/P14-1146 (accessed May 18, 2018).
Instructions: 

Data

We include three files, each corresponding to one of the models which are described in detail in [1]:

1.      embeddings_ASEP.txt: the Arabic Sentiment Embeddings built using the Prediction model.

2.      embeddings_ASER.txt: the Arabic Sentiment Embeddings built using the Ranking model.

3.      embeddings_ASEH.txt: the Arabic Sentiment Embeddings built using the Hybrid model.

 

Each of the files contains 212,976 lines, starting with the word in the vocabulary, followed by a space, and then 50 decimal numbers separated by spaces (which represent the word vector).

Dataset Files

    Files have not been uploaded for this dataset

    Documentation

    AttachmentSize
    File Arabic Sentiment Embeddings.pdf350.35 KB