Reddit Submissions of Opioid Related Content in Philadelphia Oriented Subreddits

Citation Author(s):
Glenn
Sterner
The Pennsylvania State University
Sean
Parsons
The Pennsylvania State University
Submitted by:
Glenn Sterner
Last updated:
Thu, 05/21/2020 - 12:10
DOI:
10.21227/tvn3-jy53
Data Format:
License:
5
1 rating - Please login to submit your rating.

Abstract 

Reddit is one of the largest social media websites in the world and it contains valuable data about its users and their perspectives organized into virtual communities or subreddits, based on common areas of interest.  Substance use issues are particularly salient within this online community due to the burgeoning substance use (opioid) crisis within the United States among other countries.  A particularly important location for understanding user perceptions of opioids is the Philadelphia, Pennsylvania, USA region, due to the prevalence associated with overdose deaths.  To collect user sentiment on opioid use within this region, the researchers have targeted subreddits related to Philadelphia.  By referencing a predefined keyword list relating to opioids (included in the dataset), the researchers iterated through each subreddit and found all instances of the keywords.  The dataset comprises submissions and comments that include the keywords. The data were collected directly from the Reddit API via the praw library in the Python programming language.

Instructions: 

Included is the dataset in a CSV file, data dictionary for all variables (column key) in a text file, keyword list used to query the Reddit API in a text file, and the targeted subreddit list in a text file. The dataset comprises entries (submissions, comments) that had keyword query results within targeted subreddits.  The dataset includes designations for submissions and comments within the data dictionary; submission denotes the first order entry within a subreddit, comment denotes entries that are posted in response to submissions or other comments. Rows include all potential entries within the targeted subreddits from January 1, 2005 – May 14, 2020.  

 

There are 56,979 rows of data in the CSV file.