Coronavirus (COVID-19) Geo-tagged Tweets Dataset

JNU, New Delhi
Rabindra Lamsal
Tue, 06/02/2020 - 00:37
This dataset contains the IDs of geo-tagged tweets. The tweets were captured by an on-going project deployed at The geolocation data was extracted from the tweets which mentioned anything about "corona", "covid-19", "coronavirus" or the variants of "sars-cov-2". Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs. The tweet IDs in this dataset belong to the tweets tweeted providing an exact location.

Note: I started sharing the IDs of the tweets that contained the exact location information, only since April 28, 2020, with some genuine requests coming in from academic researchers who did not want to hydrate the whole lists of IDs (above 140+ million tweets) shared in the Coronavirus (COVID-19) Tweets Dataset.

If you need the geolocation-based data starting March 20, 2020, then use the Coronavirus (COVID-19) Tweets Dataset and hydrate the IDs while adding the following condition:

data = json.loads(data)

if data["coordinates"]:

       longitude, latitude = data["coordinates"]["coordinates"]

The data is available in two formats: CSV and JSON. I'll be sharing new files every day, and the files will be named period-wise. For example, april28-april29.* will contain tweet ID and sentiment data of the tweets that were tweeted between April 28, 2020, and April 29, 2020.

Why are only tweet IDs being shared? It is Twitter's content redistribution policy that restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers to always pull fresh data using their API. Why? Here's my opinion. Maybe, some user might want to delete a particular tweet after a couple of minute(s)/hour(s)/day(s), and if the same tweet has already been pulled and is shared on a public domain, then it might make the user/community vulnerable to many inferences coming out of the shared data.


To hydrate the tweet IDs, you can use applications such as DocNow's Hydrator (available for OS X, Windows and Linux) or QCRI's Tweets Downloader (java based). Please go through the documentation of the respective tools to know the downloading process.


Great Work!


Submitted by Sadiksha sharma on Sun, 04/26/2020 - 04:14

Thanks, sadiksha!

Submitted by Rabindra Lamsal on Fri, 05/08/2020 - 02:46

Thank you very much for providing this dataset and your support

Submitted by hanaa hammad on Tue, 05/05/2020 - 09:39

My pleasure, Hanaa!

Submitted by Rabindra Lamsal on Tue, 05/05/2020 - 12:39

I created an ieee account just to download this dataset. There are numerous tweets datasets currently floating around but did not have particularly the list of tweets ids that had pin location. Thanks for your efforts.

Submitted by Curran White on Fri, 05/08/2020 - 02:45

Thanks, Curran! I am glad that you found the dataset useful.

Submitted by Rabindra Lamsal on Fri, 05/08/2020 - 03:20

Hi, I hydrated IDS file using twarc. ( TweetIDs/pull/2/commits/7d16ff3f29acf15af88c0d27424041b711865be3).

 But when I tried to add the condition you used to get geolocation data, it gives me error for invalid syntax.

It would be nice if you can share which twarc code you used so that I can edit the variable names properly.

You have done great work!

Submitted by WonSeok Kim on Sat, 05/09/2020 - 15:17

Hey Kim. I think you meant using twarc ( That was just a pseudo-code which I had mentioned in the abstract (I've now replaced it with an excerpt of the real code to avoid confusion). 

It does not matter how you are getting your JSON archived. Just make sure to add the following "if clause" in whatever way you're trying to pull the tweets. The "if clause" below will only be TRUE if the tweet contains an exact pin location.

data = json.loads(data)

if data["coordinates"]:

       longitude, latitude = data["coordinates"]["coordinates"]

Now you can store the longitude and latitude values as per your convenience. I hope this helps!

Submitted by Rabindra Lamsal on Sun, 05/24/2020 - 12:53

hey i want to download full data not only id , how can i do so please give response


Submitted by charu v on Wed, 05/20/2020 - 13:46

Hello Charu. Twitter's data sharing policy does not allow anyone to share tweet information other than tweet ID and/or user ID. The list of IDs should be hydrated to re-create a full fresh tweet dataset. For this purpose, you can use applications such as DocNow's Hydrator or QCRI's Tweets Downloader.

Submitted by Rabindra Lamsal on Fri, 05/29/2020 - 22:28

