Skip to main content

Sentiment Analysis

The COVID-19 Vaccine Misinformation Aspects Dataset contains 3,822 English tweets discussing COVID-19 vaccine misinformation, collected from Twitter/X between December 31, 2020, and July 8, 2021. Each tweet is manually annotated and categorized into four distinct misinformation aspects: (1) Vaccine Constituent, (2) Adverse Effects, (3) Agenda-Driven Narratives, and (4) Efficacy and Clinical Trials.

Categories:

Please cite the following paper when using this dataset:

Vanessa Su and Nirmalya Thakur, “COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations”, Proceedings of the IEEE 15th Annual Computing and Communication Workshop and Conference 2025, Las Vegas, USA, Jan 06-08, 2025 (Paper accepted for publication, Preprint: https://arxiv.org/abs/2412.17180).

Abstract:

Categories:

We gathered a total of 1,515 news articles concerning suicide, building jumps, and related incidents from 2019 to 2024. Utilizing sentiment analysis tools, we categorized the data into two groups: positive sentiment words and negative sentiment words. Our primary objective was to examine the relationship between negative sentiment words and other associated terms.

Categories:

To download the dataset without purchasing an IEEE Dataport subscription, please visit: https://zenodo.org/records/13738598

Please cite the following paper when using this dataset:

N. Thakur, “Mpox narrative on Instagram: A labeled multilingual dataset of Instagram posts on mpox for sentiment, hate speech, and anxiety analysis,” arXiv [cs.LG], 2024, URL: https://arxiv.org/abs/2409.05292

Abstract

Categories:

Data were collected through the Twitter API, focusing on specific vocabulary related to wildfires, hashtags commonly used during the Tubbs Fire, and terms and hashtags related to mental health, well-being, and physical symptoms associated with smoke and wildfire exposure. We focused exclusively on the period from October 8 to October 31, aligning precisely with the duration of the Tubbs Fire. The final dataset available for analysis consists of 90,759 tweets.

Categories:

This data collection focuses on capturing user-generated content from the popular social network Reddit during the year 2023. This dataset comprises 29 user-friendly CSV files collected from Reddit, containing textual data associated with various emotions and related concepts.

Categories: