MegaGeoCOV Extended

Citation Author(s):
University of Melbourne
Maria Rodriguez
University of Melbourne
University of Melbourne
Submitted by:
Rabindra Lamsal
Last updated:
Thu, 02/23/2023 - 20:44
Data Format:
Research Article Link:
0 ratings - Please login to submit your rating.


This dataset (MegaGeoCOV Extended), which is an extended version of MegaGeoCOV, was introduced in this paper: A Twitter narrative of the COVID-19 pandemic in Australia (the paper will appear in proceedings of the 20th ISCRAM conference, Omaha, Nebraska, USA May 2023). Please refer to the paper for more details (e.g., keywords and hashtags used, descriptive statistics, etc.).

MegaGeoCOV Extended contains over 25.2 million geotagged tweets (multilingual) specific to the COVID-19 pandemic. We also provide an English-only version which has 17.8 million tweets. We used Twitter's Full-archive search endpoint for curating this dataset. A free IEEE account is sufficient to access the data files. As per Twitter's content re-distribution policy, we share tweet identifiers; the identifiers need to be hydrated to recreate the dataset locally. Hydration can be easily done with tools such as Hydrator and twarc. The dataset includes the following tweet objects for filtering the tweet identifiers: created_at, id, author.verified, author_id,, and source. Note that, after hydration, the number of tweets can vary as deleted or private tweets are not retrievable. 

Dataset usage terms

By using this dataset, you agree to: (i) use the content of this dataset and the data generated from the content of this dataset for non-commercial research only, (ii) remain in compliance with Twitter's Policy and (iii) cite the following paper:

Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera. (2023). A Twitter narrative of the COVID-19 pandemic in Australia. arXiv preprint arXiv:2302.11136.


The dataset is in CSV format. Tweet identifiers can be filtered as per requirements, as we provide additional tweet objects for filtration. Consider using Hydrator or twarc for hydrating the tweet identifiers. Please refer to this paper for more details on tweet hydration: BillionCOV: An Enriched Billion-scale Collection of COVID-19 tweets for Efficient Hydration.

Dataset Files

Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.