Reddit

Citation Author(s):
feihu
che
Submitted by:
Feihu Che
Last updated:
Wed, 11/13/2024 - 08:44
DOI:
10.21227/7p97-f663
License:
19 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

<p>This dataset represents a user interaction network from Reddit, where individual users are represented as nodes. The network connections (edges) are established when users interact through replies. Each node contains features derived from the user's subreddit posting history. The classification goal is to identify users within the top 50% popularity bracket, based on their subreddit score averages.&nbsp;</p>

Instructions: 

The dataset was constructed using publicly available Reddit data, incorporating user replies and post scores. Node features include each user's recent posting history (capped at three most recent posts). The popularity classification was implemented by calculating the median score across all users' historical posts, with those above the median labeled as popular and those below as normal users. The dataset has 33434 nodes, 198448 edges, the train, valid, test ratios are 0.1, 0.1, 0.8, respectively. 

Dataset Files

    Files have not been uploaded for this dataset