Skip to main content

Datasets

Standard Dataset

AG News and IMDB

Citation Author(s):
Xinxin Li
Submitted by:
Xinxin Li
Last updated:
DOI:
10.21227/f9vv-5898
Data Format:
No Ratings Yet

Abstract

In this paper, two datasets for text classification were primarily used in the experiments: AG News and IMDB. The AG News dataset is a widely used four-class news dataset, including four categories: World News, Sports News, Business News, and Technology News. The dataset contains a total of 120,000 samples, with 114,000 samples in the training set and the remaining 6,000 samples in the test set. The IMDB dataset is a movie review dataset used for sentiment analysis, primarily for binary classification tasks, i.e., positive and negative reviews. This dataset contains 50,000 samples, with 25,000 samples in the training set and 25,000 samples in the test set.

Instructions:

The uploaded data includes two dataset files: one for the AG News dataset and one for the IMDB dataset. Each dataset file consists of a train file and a test file.

Dataset Files

Files have not been uploaded for this dataset