COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations

- Citation Author(s):
-
Vanessa Su (Emory University)Nirmalya Thakur (South Dakota School of Mines and Technology)
- Submitted by:
- Nirmalya Thakur
- Last updated:
- DOI:
- 10.21227/sbj6-pt91
- Data Format:
- Categories:
- Keywords:
-
Abstract
Please cite the following paper when using this dataset:
Vanessa Su and Nirmalya Thakur, “COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations”, Proceedings of the IEEE 15th Annual Computing and Communication Workshop and Conference 2025, Las Vegas, USA, Jan 06-08, 2025 (Paper accepted for publication, Preprint: https://arxiv.org/abs/2412.17180).
Abstract:
This dataset comprises metadata and analytical attributes for 9,325 publicly available YouTube videos related to COVID-19, published between January 1, 2023, and October 25, 2024. The dataset was created using the YouTube API and refined through rigorous data cleaning and preprocessing.
Key Attributes of the Dataset:
- Video URL: The full URL linking to each video.
- Video ID: A unique identifier for each video.
- Title: The title of the video.
- Description: A detailed textual description provided by the video uploader.
- Publish Date: The date the video was published, ranging from January 1, 2023, to October 25, 2024.
- View Count: The total number of views per video, ranging from 0 to 30,107,100 (mean: ~59,803).
- Like Count: The number of likes per video, ranging from 0 to 607,138 (mean: ~1,413).
- Comment Count: The number of comments, varying from 1 to 25,000 (mean: ~147).
- Duration: Video length in seconds, ranging from 0 to 42,900 seconds (median: 137 seconds).
- Categories: Categorization of videos into 15 unique categories, with "News & Politics" being the most common (4,035 videos).
- Tags: Tags associated with each video.
- Language: The language of the video, predominantly English ("en").
Instructions:
Please refer to the above-mentioned paper for details about this dataset