Explainable Sentiment Analysis Dataset

Citation Author(s):
Donghao
HUANG
Singapore Management University
Zhaoxia
WANG
Singapore Management University
Submitted by:
Donghao Huang
Last updated:
Sat, 02/01/2025 - 06:32
DOI:
10.21227/hx7g-vv29
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The Explainable Sentiment Analysis Dataset provides annotated sentiment classification data for Amazon Reviews and IMDB Movie Reviews, facilitating the evaluation of sentiment analysis models with a focus on explainability. It includes ground-truth sentiment labels, model-generated predictions, and fine-grained classification results obtained from various large language models (LLMs), including both proprietary (GPT-4o/GPT-4o-mini) and open-source models (DeepSeek-R1 full and distilled models).

The dataset is structured into ground-truths (human-annotated sentiment labels) and results (LLM-generated predictions), allowing direct comparisons between human and model performance. It supports multi-level sentiment classification, ranging from binary (positive/negative) to five-class sentiment categorization (e.g., strongly positive to strongly negative).

Each model’s output includes structured sentiment predictions along with textual explanations, enabling deeper insights into the reasoning process behind sentiment classification. Additionally, the dataset captures explanation content from DeepSeek-R1 models, enhancing transparency and interpretability in sentiment analysis.

This dataset serves as a benchmark for evaluating the explainability, accuracy, and efficiency of sentiment classification models and is particularly useful for researchers, NLP practitioners, and developers interested in improving trustworthy AI applications in sentiment analysis.

Instructions: 

refer to README.md

Documentation

AttachmentSize
File README.md3.29 KB