MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions

Name: MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions
Creator: Nirmalya Thakur
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Artificial Intelligence, Machine Learning, Other, Sensors, Wearable Sensing, Social Sciences, Standards Research Data, Biomedical and Health Sciences, Communications, Computational Intelligence, COVID-19, Demographic, Health

Citation Author(s):: Nirmalya
Thakur

Emory University
Submitted by:: Nirmalya Thakur
Last updated:: Mon, 03/27/2023 - 01:34
DOI:: 10.21227/16ca-c879
Data Format:: *.avi; *.csv; *.txt
Research Article Link:: Link to the Paper (Peer-Reviewed)
License:: Creative Commons Attribution

3619 Views

Categories:: Artificial Intelligence
Machine Learning
Other
Sensors
Wearable Sensing
Social Sciences
Standards Research Data
Biomedical and Health Sciences
Communications
Computational Intelligence
COVID-19
Demographic
Health
Keywords:: Monkeypox

0 ratings - Please login to submit your rating.

ACCESS DATASET CITE

Abstract

Please cite the following paper when using this dataset:

N. Thakur, “MonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions,” Infect. Dis. Rep., vol. 14, no. 6, pp. 855–883, 2022, DOI: https://doi.org/10.3390/idr14060087.

Abstract

The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Therefore, this work presents an open-access dataset of 601,432 Tweets about monkeypox that have been posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset complies with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description

The dataset consists of a total of 601,432 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 3rd March 2023 (the most recent date at the time of uploading the most recent version of the dataset). The Tweet IDs are presented in 12 different .txt files based on the timelines of the associated tweets. The following represents the details of these dataset files.

Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the associated Tweet IDs: May 7, 2022, to May 21, 2022)
Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the associated Tweet IDs: May 21, 2022, to May 27, 2022)
Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the associated Tweet IDs: May 27, 2022, to June 5, 2022)
Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the associated Tweet IDs: June 5, 2022, to June 11, 2022)
Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 46718, Date Range of the associated Tweet IDs: June 12, 2022, to June 30, 2022)
Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the associated Tweet IDs: July 1, 2022, to July 23, 2022)
Filename: TweetIDs_Part7.txt (No. of Tweet IDs: 105890, Date Range of the associated Tweet IDs: July 24, 2022, to July 31, 2022)
Filename: TweetIDs_Part8.txt (No. of Tweet IDs: 93959, Date Range of the associated Tweet IDs: August 1, 2022, to August 9, 2022)
Filename: TweetIDs_Part9.txt (No. of Tweet IDs: 50832, Date Range of the associated Tweet IDs: August 10, 2022, to August 24, 2022)
Filename: TweetIDs_Part10.txt (No. of Tweet IDs: 39042, Date Range of the associated Tweet IDs: August 25, 2022, to September 19, 2022)
Filename: TweetIDs_Part11.txt (No. of Tweet IDs: 12341, Date Range of the associated Tweet IDs: September 20, 2022, to October 9, 2022)
Filename: TweetIDs_Part12.txt (No. of Tweet IDs: 15404, Date Range of the associated Tweet IDs: October 10, 2022, to November 11, 2022)
Filename: TweetIDs_Part13.txt (No. of Tweet IDs: 30062, Date Range of the associated Tweet IDs: November 12, 2022, to March 3, 2023)

Please note: The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset, the Hydrator application may be used (a step-by-step process on how to use Hydrator to hydrate this dataset is explained in the above-mentioned paper).

Instructions:

Please refer to the data description mentioned above.