Datasets
Standard Dataset
- Citation Author(s):
- Submitted by:
- Feihu Che
- Last updated:
- Wed, 11/13/2024 - 08:44
- DOI:
- 10.21227/7p97-f663
- License:
- Categories:
- Keywords:
Abstract
<p>This dataset represents a user interaction network from Reddit, where individual users are represented as nodes. The network connections (edges) are established when users interact through replies. Each node contains features derived from the user's subreddit posting history. The classification goal is to identify users within the top 50% popularity bracket, based on their subreddit score averages. </p>
The dataset was constructed using publicly available Reddit data, incorporating user replies and post scores. Node features include each user's recent posting history (capped at three most recent posts). The popularity classification was implemented by calculating the median score across all users' historical posts, with those above the median labeled as popular and those below as normal users. The dataset has 33434 nodes, 198448 edges, the train, valid, test ratios are 0.1, 0.1, 0.8, respectively.