Coronavirus (COVID-19) Geo-tagged Tweets Dataset

Coronavirus (COVID-19) Geo-tagged Tweets Dataset

Citation Author(s):
Rabindra
Lamsal
JNU, New Delhi
Submitted by:
Rabindra Lamsal
Last updated:
Tue, 06/02/2020 - 00:37
DOI:
10.21227/fpsb-jz61
Data Format:
Links:
License:
Dataset Views:
7826
Rating:
3.666665
3 ratings - Please login to submit your rating.
Share / Embed Cite

This dataset contains the IDs of geo-tagged tweets. The tweets were captured by an on-going project deployed at https://live.rlamsal.com.np. The geolocation data was extracted from the tweets which mentioned anything about "corona", "covid-19", "coronavirus" or the variants of "sars-cov-2". Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs. The tweet IDs in this dataset belong to the tweets tweeted providing an exact location.

Note: I started sharing the IDs of the tweets that contained the exact location information, only since April 28, 2020, with some genuine requests coming in from academic researchers who did not want to hydrate the whole lists of IDs (above 140+ million tweets) shared in the Coronavirus (COVID-19) Tweets Dataset.

If you need the geolocation-based data starting March 20, 2020, then use the Coronavirus (COVID-19) Tweets Dataset and hydrate the IDs while adding the following condition:

data = json.loads(data)

if data["coordinates"]:

       longitude, latitude = data["coordinates"]["coordinates"]

The data is available in two formats: CSV and JSON. I'll be sharing new files every day, and the files will be named period-wise. For example, april28-april29.* will contain tweet ID and sentiment data of the tweets that were tweeted between April 28, 2020, and April 29, 2020.

Why are only tweet IDs being shared? It is Twitter's content redistribution policy that restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers to always pull fresh data using their API. Why? Here's my opinion. Maybe, some user might want to delete a particular tweet after a couple of minute(s)/hour(s)/day(s), and if the same tweet has already been pulled and is shared on a public domain, then it might make the user/community vulnerable to many inferences coming out of the shared data.

Instructions: 

To hydrate the tweet IDs, you can use applications such as DocNow's Hydrator (available for OS X, Windows and Linux) or QCRI's Tweets Downloader (java based). Please go through the documentation of the respective tools to know the downloading process.

Comments

Great Work!

 

Submitted by Sadiksha sharma on Sun, 04/26/2020 - 04:14

Thanks, sadiksha!

Submitted by Rabindra Lamsal on Fri, 05/08/2020 - 02:46

Thank you very much for providing this dataset and your support

Submitted by hanaa hammad on Tue, 05/05/2020 - 09:39

My pleasure, Hanaa!

Submitted by Rabindra Lamsal on Tue, 05/05/2020 - 12:39

I created an ieee account just to download this dataset. There are numerous tweets datasets currently floating around but did not have particularly the list of tweets ids that had pin location. Thanks for your efforts.

Submitted by Curran White on Fri, 05/08/2020 - 02:45

Thanks, Curran! I am glad that you found the dataset useful.

Submitted by Rabindra Lamsal on Fri, 05/08/2020 - 03:20

Hi, I hydrated IDS file using twarc. (https://github.com/echen102/COVID-19 TweetIDs/pull/2/commits/7d16ff3f29acf15af88c0d27424041b711865be3).

 But when I tried to add the condition you used to get geolocation data, it gives me error for invalid syntax.

It would be nice if you can share which twarc code you used so that I can edit the variable names properly.

You have done great work!

Submitted by WonSeok Kim on Sat, 05/09/2020 - 15:17

Hey Kim. I think you meant using twarc (https://github.com/DocNow/twarc). That was just a pseudo-code which I had mentioned in the abstract (I've now replaced it with an excerpt of the real code to avoid confusion). 

It does not matter how you are getting your JSON archived. Just make sure to add the following "if clause" in whatever way you're trying to pull the tweets. The "if clause" below will only be TRUE if the tweet contains an exact pin location.

data = json.loads(data)

if data["coordinates"]:

       longitude, latitude = data["coordinates"]["coordinates"]

Now you can store the longitude and latitude values as per your convenience. I hope this helps!

Submitted by Rabindra Lamsal on Sun, 05/24/2020 - 12:53

hey i want to download full data not only id , how can i do so please give response

 

Submitted by charu v on Wed, 05/20/2020 - 13:46

Hello Charu. Twitter's data sharing policy does not allow anyone to share tweet information other than tweet ID and/or user ID. The list of IDs should be hydrated to re-create a full fresh tweet dataset. For this purpose, you can use applications such as DocNow's Hydrator or QCRI's Tweets Downloader.

Submitted by Rabindra Lamsal on Fri, 05/29/2020 - 22:28

Embed this dataset on another website

Copy and paste the HTML code below to embed your dataset:

Share via email or social media

Click the buttons below:

facebooktwittermailshare
[1] Rabindra Lamsal, "Coronavirus (COVID-19) Geo-tagged Tweets Dataset", IEEE Dataport, 2020. [Online]. Available: http://dx.doi.org/10.21227/fpsb-jz61. Accessed: Jun. 02, 2020.
@data{fpsb-jz61-20,
doi = {10.21227/fpsb-jz61},
url = {http://dx.doi.org/10.21227/fpsb-jz61},
author = {Rabindra Lamsal },
publisher = {IEEE Dataport},
title = {Coronavirus (COVID-19) Geo-tagged Tweets Dataset},
year = {2020} }
TY - DATA
T1 - Coronavirus (COVID-19) Geo-tagged Tweets Dataset
AU - Rabindra Lamsal
PY - 2020
PB - IEEE Dataport
UR - 10.21227/fpsb-jz61
ER -
Rabindra Lamsal. (2020). Coronavirus (COVID-19) Geo-tagged Tweets Dataset. IEEE Dataport. http://dx.doi.org/10.21227/fpsb-jz61
Rabindra Lamsal, 2020. Coronavirus (COVID-19) Geo-tagged Tweets Dataset. Available at: http://dx.doi.org/10.21227/fpsb-jz61.
Rabindra Lamsal. (2020). "Coronavirus (COVID-19) Geo-tagged Tweets Dataset." Web.
1. Rabindra Lamsal. Coronavirus (COVID-19) Geo-tagged Tweets Dataset [Internet]. IEEE Dataport; 2020. Available from : http://dx.doi.org/10.21227/fpsb-jz61
Rabindra Lamsal. "Coronavirus (COVID-19) Geo-tagged Tweets Dataset." doi: 10.21227/fpsb-jz61