Datasets
Standard Dataset
Stack Overflow Dataset for User Engagement, Technology and Emotion Analysis

- Citation Author(s):
- Submitted by:
- Linda Okpanachi
- Last updated:
- Mon, 03/17/2025 - 15:26
- DOI:
- 10.21227/zece-e657
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This dataset comprises user-generated content from Stack Overflow, including post bodies, post tags, and user engagement metrics such as upvotes and downvotes. The data was collected from the stack exchange explorer based on user defined categories and other criteria like reputation and badges as explained in our work. It was collected to support research in technology and emotion analysis, focusing on understanding user interactions and sentiments within online communities. The data is provided in CSV format and includes key variables like post content, tags, user interactions, and engagement indicators. This dataset is suitable for applications in natural language processing (NLP), emotion detection, and data analysis. All data has been anonymized to ensure privacy and comply with ethical research standards.
The datasets are provided in CSV format and can be analyzed using data analysis tools like Python and Jupyter Notebook(with libraries such as Pandas and Matplotlib). Users can load the data, explore its structure, and perform analyses such as filtering, grouping, and plotting trends over time.
Basic Usage Steps:
-
Load the dataset using:
import pandas as pd
data = pd.read_csv('filename.csv')
-
Explore the data with:
print(data.head())
andprint(data.info())
-
Perform analysis like grouping by tags, tracking trends over time, or calculating engagement metrics.
-
Save processed data using:
data.to_csv('processed_file.csv', index=False)
Dataset Files
- This dataset captures user engagement data from a technology platform, categorized by user expertise levels. User Data.zip (1.57 MB)
- This dataset captures low-quality posts (defined as posts without answers) from StackOverflow, categorized by user expertise lev Quality o Post.zip (1.71 MB)
- The dataset is structured to support research in analyzing trends in technology discussions and emotions expressed within these Dataset for Technology and Emotion.zip (21.77 MB)
- The uploaded analysis script provides step-by-step guidance for working with the dataset, including all analysis Analysis.zip (36.56 MB)
Documentation