Datasets
Standard Dataset
AG News and IMDB
- Citation Author(s):
- Submitted by:
- Xinxin Li
- Last updated:
- Tue, 08/27/2024 - 06:39
- DOI:
- 10.21227/f9vv-5898
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
In this paper, two datasets for text classification were primarily used in the experiments: AG News and IMDB. The AG News dataset is a widely used four-class news dataset, including four categories: World News, Sports News, Business News, and Technology News. The dataset contains a total of 120,000 samples, with 114,000 samples in the training set and the remaining 6,000 samples in the test set. The IMDB dataset is a movie review dataset used for sentiment analysis, primarily for binary classification tasks, i.e., positive and negative reviews. This dataset contains 50,000 samples, with 25,000 samples in the training set and 25,000 samples in the test set.
The uploaded data includes two dataset files: one for the AG News dataset and one for the IMDB dataset. Each dataset file consists of a train file and a test file.