Skip to main content

Datasets

Standard Dataset

weibo_senti_100k and THUCNews

Citation Author(s):
Maosong Sun
Submitted by:
YUANYUAN Zhang
Last updated:
DOI:
10.21227/abj8-y636
Links:
315 views
Categories:
No Ratings Yet

Abstract

Weibo_senti_100k sentiment classification data set is a two-class classification data set, the average length is 42.9 words. This data set contains 100,000 pieces of Sina weibo text data, including two categories.THUCNews news classification data set contains 50,000 pieces of data, the average length is 534.53 words, including 10 categories.

Instructions:

Weibo_senti_100k sentiment classification data set is a two-class classification data set, the average length is 42.9 words. This data set contains 100,000 pieces of Sina weibo text data, including two categories: positive emotion and negative emotion. Each category has about 50,000 pieces of data.THUCNews news classification data set contains 50,000 pieces of data, the average length is 534.53 words, including 10 categories: finance, real estate, home furnishing, education, technology, fashion, political events, sports, games, entertainment. Each category has 5000 pieces of data.