Explainable Sentiment Analysis Dataset

Citation Author(s):: Donghao HUANG (Singapore Management University)

Zhaoxia WANG (Singapore Management University)
Submitted by:: Donghao Huang
Last updated:: Sat, 02/01/2025 - 11:32
DOI:: 10.21227/hx7g-vv29
Data Format:: *.csv

200 views

Categories:

Keywords:

artificial intelligence; machine learning; sentiment analysis

ACCESS DATASET CITE

Abstract

The Explainable Sentiment Analysis Dataset provides annotated sentiment classification data for Amazon Reviews and IMDB Movie Reviews, facilitating the evaluation of sentiment analysis models with a focus on explainability. It includes ground-truth sentiment labels, model-generated predictions, and fine-grained classification results obtained from various large language models (LLMs), including both proprietary (GPT-4o/GPT-4o-mini) and open-source models (DeepSeek-R1 full and distilled models).

The dataset is structured into ground-truths (human-annotated sentiment labels) and results (LLM-generated predictions), allowing direct comparisons between human and model performance. It supports multi-level sentiment classification, ranging from binary (positive/negative) to five-class sentiment categorization (e.g., strongly positive to strongly negative).

Each model’s output includes structured sentiment predictions along with textual explanations, enabling deeper insights into the reasoning process behind sentiment classification. Additionally, the dataset captures explanation content from DeepSeek-R1 models, enhancing transparency and interpretability in sentiment analysis.

This dataset serves as a benchmark for evaluating the explainability, accuracy, and efficiency of sentiment classification models and is particularly useful for researchers, NLP practitioners, and developers interested in improving trustworthy AI applications in sentiment analysis.