Skip to main content

Datasets

Standard Dataset

Explainable Sentiment Analysis Dataset

Citation Author(s):
Donghao HUANG (Singapore Management University)
Zhaoxia WANG (Singapore Management University)
Submitted by:
Donghao Huang
Last updated:
DOI:
10.21227/hx7g-vv29
Data Format:
No Ratings Yet

Abstract

The Explainable Sentiment Analysis Dataset provides annotated sentiment classification data for Amazon Reviews and IMDB Movie Reviews, facilitating the evaluation of sentiment analysis models with a focus on explainability. It includes ground-truth sentiment labels, model-generated predictions, and fine-grained classification results obtained from various large language models (LLMs), including both proprietary (GPT-4o/GPT-4o-mini) and open-source models (DeepSeek-R1 full and distilled models).

The dataset is structured into ground-truths (human-annotated sentiment labels) and results (LLM-generated predictions), allowing direct comparisons between human and model performance. It supports multi-level sentiment classification, ranging from binary (positive/negative) to five-class sentiment categorization (e.g., strongly positive to strongly negative).

Each model’s output includes structured sentiment predictions along with textual explanations, enabling deeper insights into the reasoning process behind sentiment classification. Additionally, the dataset captures explanation content from DeepSeek-R1 models, enhancing transparency and interpretability in sentiment analysis.

This dataset serves as a benchmark for evaluating the explainability, accuracy, and efficiency of sentiment classification models and is particularly useful for researchers, NLP practitioners, and developers interested in improving trustworthy AI applications in sentiment analysis.

Instructions:

refer to README.md