Datasets
Standard Dataset
Arabic Sentiment Embeddings
- Citation Author(s):
- Submitted by:
- Nora Al-Twairesh
- Last updated:
- Tue, 05/17/2022 - 22:17
- DOI:
- 10.21227/aavk-g896
- Research Article Link:
- License:
- Categories:
Abstract
Includes sentiment-specific distributed word representations that have been trained on 10M Arabic tweets that are distantly supervised using positive and negative keywords. As described in the paper [1], we follow Tang’s [2] three neural architectures, which encode the sentiment of a word in addition to its semantic and syntactic representation.
Specifications Table
Subject area
Natural Language Processing
More specific subject area
Arabic Sentiment Embeddings
Type of data
text files
How data was acquired
Training Tang’s [2] models on an Arabic tweets dataset that was independently collected.
Data format
Raw
Data source location
Not applicable
Data accessibility
Value of the data
· May replace hand-engineered features for sentiment classification.
· Can be used for benchmarking other Arabic sentiment embeddings.
· The Arabic sentiment embeddings can be used for other NLP tasks where sentiment is important.
References
- N. Al-Twairesh, H. Al-Negheimish, Surface and Deep Features Ensemble for Sentiment Analysis of Arabic Tweets , in submission.
- D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, B. Qin, Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification, in: Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. Vol. 1 Long Pap., Association for Computational Linguistics, Baltimore, Maryland, 2014: pp. 1555–1565. http://www.aclweb.org/anthology/P14-1146 (accessed May 18, 2018).
Data
We include three files, each corresponding to one of the models which are described in detail in [1]:
1. embeddings_ASEP.txt: the Arabic Sentiment Embeddings built using the Prediction model.
2. embeddings_ASER.txt: the Arabic Sentiment Embeddings built using the Ranking model.
3. embeddings_ASEH.txt: the Arabic Sentiment Embeddings built using the Hybrid model.
Each of the files contains 212,976 lines, starting with the word in the vocabulary, followed by a space, and then 50 decimal numbers separated by spaces (which represent the word vector).
Documentation
Attachment | Size |
---|---|
Arabic Sentiment Embeddings.pdf | 350.35 KB |