Datasets
Standard Dataset
weibo_senti_100k and THUCNews
- Citation Author(s):
- Submitted by:
- YUANYUAN Zhang
- Last updated:
- Wed, 07/27/2022 - 09:26
- DOI:
- 10.21227/abj8-y636
- Links:
- License:
- Categories:
Abstract
Weibo_senti_100k sentiment classification data set is a two-class classification data set, the average length is 42.9 words. This data set contains 100,000 pieces of Sina weibo text data, including two categories.THUCNews news classification data set contains 50,000 pieces of data, the average length is 534.53 words, including 10 categories.
Weibo_senti_100k sentiment classification data set is a two-class classification data set, the average length is 42.9 words. This data set contains 100,000 pieces of Sina weibo text data, including two categories: positive emotion and negative emotion. Each category has about 50,000 pieces of data.THUCNews news classification data set contains 50,000 pieces of data, the average length is 534.53 words, including 10 categories: finance, real estate, home furnishing, education, technology, fashion, political events, sports, games, entertainment. Each category has 5000 pieces of data.