Datasets
Standard Dataset
MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions
- Citation Author(s):
- Submitted by:
- Nirmalya Thakur
- Last updated:
- Mon, 03/27/2023 - 01:34
- DOI:
- 10.21227/16ca-c879
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
Please cite the following paper when using this dataset:
N. Thakur, “MonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions,” Infect. Dis. Rep., vol. 14, no. 6, pp. 855–883, 2022, DOI: https://doi.org/10.3390/idr14060087.
Abstract
The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Therefore, this work presents an open-access dataset of 601,432 Tweets about monkeypox that have been posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset complies with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.
Data Description
The dataset consists of a total of 601,432 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 3rd March 2023 (the most recent date at the time of uploading the most recent version of the dataset). The Tweet IDs are presented in 12 different .txt files based on the timelines of the associated tweets. The following represents the details of these dataset files.
- Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the associated Tweet IDs: May 7, 2022, to May 21, 2022)
- Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the associated Tweet IDs: May 21, 2022, to May 27, 2022)
- Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the associated Tweet IDs: May 27, 2022, to June 5, 2022)
- Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the associated Tweet IDs: June 5, 2022, to June 11, 2022)
- Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 46718, Date Range of the associated Tweet IDs: June 12, 2022, to June 30, 2022)
- Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the associated Tweet IDs: July 1, 2022, to July 23, 2022)
- Filename: TweetIDs_Part7.txt (No. of Tweet IDs: 105890, Date Range of the associated Tweet IDs: July 24, 2022, to July 31, 2022)
- Filename: TweetIDs_Part8.txt (No. of Tweet IDs: 93959, Date Range of the associated Tweet IDs: August 1, 2022, to August 9, 2022)
- Filename: TweetIDs_Part9.txt (No. of Tweet IDs: 50832, Date Range of the associated Tweet IDs: August 10, 2022, to August 24, 2022)
- Filename: TweetIDs_Part10.txt (No. of Tweet IDs: 39042, Date Range of the associated Tweet IDs: August 25, 2022, to September 19, 2022)
- Filename: TweetIDs_Part11.txt (No. of Tweet IDs: 12341, Date Range of the associated Tweet IDs: September 20, 2022, to October 9, 2022)
- Filename: TweetIDs_Part12.txt (No. of Tweet IDs: 15404, Date Range of the associated Tweet IDs: October 10, 2022, to November 11, 2022)
- Filename: TweetIDs_Part13.txt (No. of Tweet IDs: 30062, Date Range of the associated Tweet IDs: November 12, 2022, to March 3, 2023)
Please note: The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset, the Hydrator application may be used (a step-by-step process on how to use Hydrator to hydrate this dataset is explained in the above-mentioned paper).
Please refer to the data description mentioned above.
Dataset Files
- TweetIDs_Part1.txt (285.59 kB)
- TweetIDs_Part2.txt (363.09 kB)
- TweetIDs_Part3.txt (360.63 kB)
- TweetIDs_Part4.txt (404.37 kB)
- TweetIDs_Part5.txt (958.08 kB)
- TweetIDs_Part6.txt (2.78 MB)
- TweetIDs_Part7.txt (2.12 MB)
- TweetIDs_Part8.txt (1.88 MB)
- TweetIDs_Part9.txt (1.02 MB)
- TweetIDs_Part10.txt (800.66 kB)
- TweetIDs_Part11.txt (253.08 kB)
- TweetIDs_Part12.txt (315.90 kB)
- TweetIDs_Part13.txt (616.50 kB)
Comments
unable to download files
Please login to your IEEE account to download the dataset files.