Data collection of user-generated content of social network of communities Reddit in 2023

Name: Data collection of user-generated content of social network of communities Reddit in 2023
Creator: Yuriy Syerov
License: https://creativecommons.org/licenses/by/4.0/

Citation Author(s):: Yuriy Syerov

Solomiia Fedushko
Submitted by:: Yuriy Syerov
Last updated:: Wed, 01/17/2024 - 22:45
DOI:: 10.21227/vyxb-p690
Data Format:: cvs

ACCESS DATASET CITE

Abstract

This data collection focuses on capturing user-generated content from the popular social network Reddit during the year 2023. This dataset comprises 29 user-friendly CSV files collected from Reddit, containing textual data associated with various emotions and related concepts. The dataset focuses on emotions such as anger, contempt, disgust, revulsion, envy, jealousy, exasperation, frustration, aggravation, agitation, annoyance, grouchiness, grumpiness, irritation, bitterness, dislike, ferocity, fury, hate, hostility, loathing, outrage, rage, resentment, scorn, spite, vengefulness, wrath, and torment.

The Reddit platform serves as a rich source of diverse user-generated content, encompassing discussions, opinions, and experiences on a wide range of topics. The dataset's compilation involved carefully selecting posts and comments that express the aforementioned emotional states or contain keywords associated with them.

In today's world, there is a growing trend of utilizing cognitive computing in various fields to aid human experts in making informed decisions that benefit their businesses. The primary objective of modern technology is to enhance human life and improve work processes. With the exponential growth of data and rapidly changing business environments, cognitive systems can effectively facilitate intelligent, seamless, and improved interactions between humans and technology.

Instructions:

Each CSV file within the dataset represents a separate category or topic and contains a collection of text-based entries that exhibit one or more of the target emotions or related sentiments. The entries may include posts, comments, or other textual content contributed by Reddit users. The information was gathered using pre-defined data in the following columns: Date, Time, Post text, Post type, Flair, # upvotes, # comments, # awards, Post tone, Post URL, Community name, Community members, Author nick, Years of membership, # Post Karma, # Comment Karma, # Awardee Karma, Author profile URL.

Researchers, social scientists, and natural language processing (NLP) practitioners utilize this dataset for a variety of purposes. Potential applications include sentiment analysis, propaganda, and fake news detection, hate speech and emotion detection, web mining, artificial intelligence (AI), opinion mining, and understanding user behavior and attitudes in online communities. The dataset's user-friendly format facilitates easy integration into analytical pipelines and machine-learning frameworks, enabling researchers to explore and analyze the data efficiently.

This data collection offers a valuable resource for investigating emotions and related concepts in online discussions, with potential applications in sentiment analysis and natural language processing research. By utilizing this dataset, researchers can gain insights into the expression of various emotions within the Reddit community and contribute to the advancement of emotion-related studies in the digital domain.

Reddit Emotion Dataset 2023 is a goldmine for researchers, offering well-organized CSV files filled with diverse user-generated content that spans a wide spectrum of emotions. This dataset's user-friendly format and potential applications in sentiment analysis and natural language processing make it an invaluable resource for unraveling the intricacies of online emotions. Thank you for this dataset. I have used this dataset for my research.

Tania K Tue, 08/29/2023 - 15:55 Permalink