Datasets
Standard Dataset
Data Set MR and sst-Binary
- Citation Author(s):
- Submitted by:
- Jinlan Chen
- Last updated:
- Sat, 12/30/2023 - 10:04
- DOI:
- 10.21227/bp27-xy39
- License:
50 Views
- Categories:
- Keywords:
0 ratings - Please login to submit your rating.
Abstract
MR is a textual dataset of movie reviews for binary sentiment classification, where each review contains only one sentence. The corpus has 5,331 positive and 5,331 negative reviews with an average length of 20.39 tokens. SST-2 is a subset of the Stanford Sentiment Treebank, where the data are labeled positive or negative, and contains 9,613 utterances with an average length of 20.32 tokens.
Instructions:
MR is a textual dataset of movie reviews for binary sentiment classification, where each review contains only one sentence.
Comments
数据来自https://github.com/FKarl/short-text-classification/tree/main/data