Sentiment Analysis | IEEE DataPort

COVID-19 VACCINE Misinformation Aspects

The COVID-19 Vaccine Misinformation Aspects Dataset contains 3,822 English tweets discussing COVID-19 vaccine misinformation, collected from Twitter/X between December 31, 2020, and July 8, 2021. Each tweet is manually annotated and categorized into four distinct misinformation aspects: (1) Vaccine Constituent, (2) Adverse Effects, (3) Agenda-Driven Narratives, and (4) Efficacy and Clinical Trials.

Categories:

COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations

Please cite the following paper when using this dataset:

Vanessa Su and Nirmalya Thakur, “COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations”, Proceedings of the IEEE 15th Annual Computing and Communication Workshop and Conference 2025, Las Vegas, USA, Jan 06-08, 2025 (Paper accepted for publication, Preprint: https://arxiv.org/abs/2412.17180).

Abstract:

Categories:

Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis

To download this dataset without purchasing an IEEE Dataport subscription, please visit: https://zenodo.org/records/13896353

Please cite the following paper when using this dataset:

Categories:

Emotional analysis keywords extracted from news

We gathered a total of 1,515 news articles concerning suicide, building jumps, and related incidents from 2019 to 2024. Utilizing sentiment analysis tools, we categorized the data into two groups: positive sentiment words and negative sentiment words. Our primary objective was to examine the relationship between negative sentiment words and other associated terms.

Categories:

Social Sciences

Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis

To download the dataset without purchasing an IEEE Dataport subscription, please visit: https://zenodo.org/records/13738598

Please cite the following paper when using this dataset:

N. Thakur, “Mpox narrative on Instagram: A labeled multilingual dataset of Instagram posts on mpox for sentiment, hate speech, and anxiety analysis,” arXiv [cs.LG], 2024, URL: https://arxiv.org/abs/2409.05292

Abstract

Categories:

A Labeled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and other Sources about the 2024 Outbreak of Measles

Please cite the following paper when using this dataset:

N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” Proceedings of the 26th International Conference on Human-Computer Interaction (HCII 2024), Washington, USA, 29 June - 4 July 2024. (URL: https://dl.acm.org/doi/10.1007/978-3-031-76806-4_17)

Abstract

Categories:

Twitter Tubbs Fire dataset

Data were collected through the Twitter API, focusing on specific vocabulary related to wildfires, hashtags commonly used during the Tubbs Fire, and terms and hashtags related to mental health, well-being, and physical symptoms associated with smoke and wildfire exposure. We focused exclusively on the period from October 8 to October 31, aligning precisely with the duration of the Tubbs Fire. The final dataset available for analysis consists of 90,759 tweets.

Categories:

Data collection of user-generated content of social network of communities Reddit in 2023

This data collection focuses on capturing user-generated content from the popular social network Reddit during the year 2023. This dataset comprises 29 user-friendly CSV files collected from Reddit, containing textual data associated with various emotions and related concepts.

Categories:

Supplementary material (Debate)

Supplementary material for article "A Group Decision-Making Method Based on the Experts’ Behaviour During the Debate". Two files containing the comments provided by four expert during a debate to select the best product.

Categories:

SART

SART contains 3000 tweets labelled with respect to the polarity of the sentiment expressed: positive, negative or neutral. Each class contains 1300 tweets and the dataset is split into train/validation/test csv files.

Categories: