Skip to main content

Datasets

Standard Dataset

Data Set MR and sst-Binary

Citation Author(s):
Jiajing Zhang (Anhui Jianzhu University)
Submitted by:
Jinlan Chen
Last updated:
DOI:
10.21227/bp27-xy39
55 views
Categories:
Keywords:
No Ratings Yet

Abstract

MR is a textual dataset of movie reviews for binary sentiment classification, where each review contains only one sentence. The corpus has 5,331 positive and 5,331 negative reviews with an average length of 20.39 tokens. SST-2 is a subset of the Stanford Sentiment Treebank, where the data are labeled positive or negative, and contains 9,613 utterances with an average length of 20.32 tokens.

Instructions:

MR is a textual dataset of movie reviews for binary sentiment classification, where each review contains only one sentence.

数据来自https://github.com/FKarl/short-text-classification/tree/main/data
Jinlan Chen Sat, 12/30/2023 - 15:23 Permalink