Coronavirus (COVID-19) Geo-tagged Tweets Dataset

Coronavirus (COVID-19) Geo-tagged Tweets Dataset

Citation Author(s):
Rabindra
Lamsal
School of Computer and Systems Sciences, JNU
Submitted by:
Rabindra Lamsal
Last updated:
Sun, 07/05/2020 - 14:03
DOI:
10.21227/fpsb-jz61
Data Format:
Links:
License:
Dataset Views:
11808
Rating:
3.666665
3 ratings - Please login to submit your rating.
Share / Embed Cite

This dataset contains the IDs of geo-tagged tweets. The tweets are captured by an on-going project deployed at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for these keywords - “corona”, "coronavirus", "covid", "pandemic", "lockdown", "quarantine", "hand sanitizer", "ppe", "n95", different possible variants of "sarscov2", "nCov", "covid-19", "ncov2019", "2019ncov", "flatten(ing) the curve", "social distancing", "work(ing) from home" and the respective hashtag of all these keywords. Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs. The tweet IDs in this dataset belong to the tweets tweeted providing an exact location. Please note that this dataset should be solely used for non-commercial research purposes (ignore every other LICENSE category given on this page).

-------------------------------------------------------------------------

Coronavirus (COVID-19) Tweets Dataset

(260+ Million English Language Tweets; ongoing collection)

-------------------------------------------------------------------------

Why are only tweet IDs being shared? Twitter's content redistribution policy restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers always to pull fresh data. It is because a user might delete a tweet or make their profile protected. If the same tweet has already been pulled and shared on a public domain, it might make the user/community vulnerable to many inferences coming out of the shared data which currently does not exist or is private.

(Tweets collected in UTC; Local time mentioned below: GMT+5:45):

march20_march21.csv: March 20, 2020 01:37 AM - March 21, 2020 09:25 AM

march21_march22.csv: March 21, 2020 09:27 AM - March 22, 2020 07:46 AM

march22_march23.csv: March 22, 2020 07:50 AM - March 23, 2020 09:08 AM

march23_march24.csv: March 23, 2020 09:11 AM - March 24, 2020 11:35 AM

march24_march25.csv: March 24, 2020 11:42 AM - March 25, 2020 11:43 AM

march25_march26.csv: March 25, 2020 11:47 AM - March 26, 2020 12:46 PM

march26_march27.csv: March 26, 2020 12:51 PM - March 27, 2020 11:53 AM

march27_march28.csv: March 27, 2020 11:56 AM - March 28, 2020 01:59 PM

march28_march29.csv: March 28, 2020 02:03 PM - March 29, 2020 04:01 PM

march29_march30.csv: NOT AVAILABLE

march30_march31.csv: March 30, 2020 02:01 PM - March 31, 2020 10:16 AM

march31_april1.csv: March 31, 2020 10:20 AM - April 01, 2020 10:59 AM

april1_april2.csv: April 01, 2020 11:02 AM - April 02, 2020 12:19 PM

april2_april3.csv: April 02, 2020 12:21 PM - April 03, 2020 02:38 PM

april3_april4.csv: April 03, 2020 02:40 PM - April 04, 2020 11:54 AM

april4_april5.csv: April 04, 2020 11:56 AM - April 05, 2020 12:54 PM

april5_april6.csv: April 05, 2020 12:56 PM - April 06, 2020 10:57 AM

april6_april7.csv: April 06, 2020 10:58 AM - April 07, 2020 12:28 PM

april7_april8.csv: April 07, 2020 12:29 PM - April 08, 2020 12:34 PM

april8_april9.csv: April 08, 2020 12:37 PM - April 09, 2020 12:18 PM

april9_april10.csv: April 09, 2020 12:20 PM - April 10, 2020 09:20 AM

april10_april11.csv: April 10, 2020 09:22 AM - April 11, 2020 10:22 AM

april11_april12.csv: April 11, 2020 10:24 AM - April 12, 2020 10:53 AM

april12_april13.csv: April 12, 2020 10:57 AM - April 13, 2020 11:43 AM

april13_april14.csv: April 13, 2020 11:46 AM - April 14, 2020 12:49 AM

april14_april15.csv: April 14, 2020 11:09 AM - April 15, 2020 12:38 PM

april15_april16.csv: April 15, 2020 12:40 PM - April 16, 2020 10:03 AM

april16_april17.csv: April 16, 2020 10:04 AM - April 17, 2020 10:38 AM

april17_april18.csv: April 17, 2020 10:40 AM - April 18, 2020 10:17 AM

april18_april19.csv: April 18, 2020 10:19 AM - April 19, 2020 09:34 AM

april19_april20.csv: April 19, 2020 09:43 AM - April 20, 2020 10:45 AM

april20_april21.csv: April 20, 2020 10:56 AM - April 21, 2020 10:47 AM

april21_april22.csv: April 21, 2020 10:54 AM - April 22, 2020 10:33 AM

april22_april23.csv: April 22, 2020 10:45 AM - April 23, 2020 10:49 AM

april23_april24.csv: April 23, 2020 11:08 AM - April 24, 2020 10:39 AM

april24_april25.csv: April 24, 2020 10:51 AM - April 25, 2020 11:50 AM

april25_april26.csv: April 25, 2020 12:20 PM - April 26, 2020 09:13 AM

april26_april27.csv: April 26, 2020 09:16 AM - April 27, 2020 10:21 AM

april27_april28.csv: April 27, 2020 10:33 AM - April 28, 2020 10:09 AM

april28_april29.csv: April 28, 2020 10:20 AM - April 29, 2020 08:48 AM

april29_april30.csv: April 29, 2020 09:09 AM - April 30, 2020 10:33 AM

april30_may1.csv: April 30, 2020 10:53 AM - May 01, 2020 10:18 AM

may1_may2.csv: May 01, 2020 10:23 AM - May 02, 2020 09:54 AM

may2_may3.csv: May 02, 2020 10:18 AM - May 03, 2020 09:57 AM

may3_may4.csv: May 03, 2020 10:09 AM - May 04, 2020 10:17 AM

may4_may5.csv: May 04, 2020 10:32 AM - May 05, 2020 10:17 AM

may5_may6.csv: May 05, 2020 10:38 AM - May 06, 2020 10:26 AM

may6_may7.csv: May 06, 2020 10:35 AM - May 07, 2020 09:33 AM

may7_may8.csv: May 07, 2020 09:55 AM - May 08, 2020 09:35 AM

may8_may9.csv: May 08, 2020 09:39 AM - May 09, 2020 09:49 AM

may9_may10.csv: May 09, 2020 09:55 AM - May 10, 2020 10:11 AM

may10_may11.csv: May 10, 2020 10:23 AM - May 11, 2020 09:57 AM

may11_may12.csv: May 11, 2020 10:08 AM - May 12, 2020 09:52 AM

may12_may13.csv: May 12, 2020 09:59 AM - May 13, 2020 10:14 AM

may13_may14.csv: May 13, 2020 10:24 AM - May 14, 2020 11:21 AM

may14_may15.csv: May 14, 2020 11:38 AM - May 15, 2020 09:58 AM

may15_may16.csv: May 15, 2020 10:13 AM - May 16, 2020 09:43 AM

may16_may17.csv: May 16, 2020 09:58 AM - May 17, 2020 10:34 AM

may17_may18.csv: May 17, 2020 10:36 AM - May 18, 2020 10:07 AM

may18_may19.csv: May 18, 2020 10:08 AM - May 19, 2020 10:07 AM

may19_may20.csv: May 19, 2020 10:08 AM - May 20, 2020 10:06 AM

may20_may21.csv: May 20, 2020 10:06 AM - May 21, 2020 10:15 AM

may21_may22.csv: May 21, 2020 10:16 AM - May 22, 2020 10:13 AM

may22_may23.csv: May 22, 2020 10:14 AM - May 23, 2020 10:08 AM

may23_may24.csv: May 23, 2020 10:08 AM - May 24, 2020 10:02 AM

may24_may25.csv: May 24, 2020 10:02 AM - May 25, 2020 10:10 AM

may25_may26.csv: May 25, 2020 10:11 AM - May 26, 2020 10:22 AM

may26_may27.csv: May 26, 2020 10:22 AM - May 27, 2020 10:16 AM

may27_may28.csv: May 27, 2020 10:17 AM - May 28, 2020 10:35 AM

may28_may29.csv: May 28, 2020 10:36 AM - May 29, 2020 10:07 AM

may29_may30.csv: May 29, 2020 10:07 AM - May 30, 2020 10:14 AM

may30_may31.csv: May 30, 2020 10:15 AM - May 31, 2020 10:13 AM

may31_june1.csv: May 31, 2020 10:13 AM - June 01, 2020 10:14 AM

june1_june2.csv: June 01, 2020 10:15 AM - June 02, 2020 10:07 AM

june2_june3.csv: June 02, 2020 10:08 AM - June 03, 2020 10:26 AM

june3_june4.csv: June 03, 2020 10:27 AM - June 04, 2020 10:23 AM

june4_june5.csv: June 04, 2020 10:26 AM - June 05, 2020 10:03 AM

june5_june6.csv: June 05, 2020 10:11 AM - June 06, 2020 10:16 AM

june6_june7.csv: June 06, 2020 10:17 AM - June 07, 2020 10:24 AM

june7_june8.csv: June 07, 2020 10:25 AM - June 08, 2020 10:13 AM

june8_june9.csv: June 08, 2020 10:13 AM - June 09, 2020 10:12 AM

june9_june10.csv: June 09, 2020 10:12 AM - June 10, 2020 10:13 AM

june10_june11.csv: June 10, 2020 10:14 AM - June 11, 2020 10:11 AM

june11_june12.csv: June 11, 2020 10:12 AM - June 12, 2020 10:10 AM

june12_june13.csv: June 12, 2020 10:11 AM - June 13, 2020 10:10 AM

june13_june14.csv: June 13, 2020 10:11 AM - June 14, 2020 10:08 AM

june14_june15.csv: June 14, 2020 10:09 AM - June 15, 2020 10:10 AM

june15_june16.csv: June 15, 2020 10:10 AM - June 16, 2020 10:10 AM

june16_june17.csv: June 16, 2020 10:11 AM - June 17, 2020 10:10 AM

june17_june18.csv: June 17, 2020 10:10 AM - June 18, 2020 10:09 AM

june18_june19.csv: June 18, 2020 10:10 AM - June 19, 2020 10:10 AM

june19_june20.csv: June 19, 2020 10:10 AM - June 20, 2020 10:10 AM

june20_june21.csv: June 20, 2020 10:10 AM - June 21, 2020 10:10 AM

june21_june22.csv: June 21, 2020 10:10 AM - June 22, 2020 10:10 AM

june22_june23.csv: June 22, 2020 10:10 AM - June 23, 2020 10:09 AM

june23_june24.csv: June 23, 2020 10:10 AM - June 24, 2020 10:09 AM

june24_june25.csv: June 24, 2020 10:10 AM - June 25, 2020 10:09 AM

june25_june26.csv: June 25, 2020 10:10 AM - June 26, 2020 10:09 AM

june26_june27.csv: June 26, 2020 10:09 AM - June 27, 2020 10:10 AM

june27_june28.csv: June 27, 2020 10:11 AM - June 28, 2020 10:10 AM

june28_june29.csv: June 28, 2020 10:10 AM - June 29, 2020 10:10 AM

june29_june30.csv: June 29, 2020 10:10 AM - June 30, 2020 10:10 AM

june30_july1.csv: June 30, 2020 10:10 AM - July 01, 2020 10:10 AM

july1_july2.csv: July 01, 2020 10:11 AM - July 02, 2020 12:28 PM

july2_july3.csv: July 02, 2020 12:29 PM - July 03, 2020 10:10 AM

july3_july4.csv: July 03, 2020 10:10 AM - July 04, 2020 07:00 AM

july4_july5.csv: July 04, 2020 07:01 AM - July 05, 2020 09:16 AM

Instructions: 

Each CSV file contains a list of tweet IDs. You can use these tweet IDs to download fresh data from Twitter (hydrating the tweet IDs). To hydrate the tweet IDs, you can use applications such as Hydrator (available for OS X, Windows and Linux; takes in CSV) or twarc (python library; takes in TXT) or QCRI's Tweets Downloader (java based; takes in TXT).

Getting the CSV files of this dataset ready for hydrating the tweet IDs:

import pandas as pd

dataframe=pd.read_csv("april28_april29.csv", header=None)

dataframe=dataframe[0]

dataframe.to_csv("ready_april28_april29.csv", index=False, header=None)

The above example code takes in the original CSV file (i.e., april28_april29.csv) from this dataset and exports just the tweet ID column to a new CSV file (i.e., ready_april28_april29.csv). The newly created CSV file can now be consumed by the Hydrator application for hydrating the tweet IDs. However, twarc and QCRI's Tweets Downloader consume a TXT file. To export the tweet ID column into a TXT file, just replace ".csv" with ".txt" in the to_csv function (last line) of the above example code.

Comments

Great Work!

 

Submitted by Sadiksha sharma on Sun, 04/26/2020 - 04:14

Thanks, sadiksha!

Submitted by Rabindra Lamsal on Fri, 05/08/2020 - 02:46

Thank you very much for providing this dataset and your support

Submitted by hanaa hammad on Tue, 05/05/2020 - 09:39

My pleasure, Hanaa!

Submitted by Rabindra Lamsal on Tue, 05/05/2020 - 12:39

I created an ieee account just to download this dataset. There are numerous tweets datasets currently floating around but did not have particularly the list of tweets ids that had pin location. Thanks for your efforts.

Submitted by Curran White on Fri, 05/08/2020 - 02:45

Thanks, Curran! I am glad that you found the dataset useful.

Submitted by Rabindra Lamsal on Fri, 05/08/2020 - 03:20

Hi, I hydrated IDS file using twarc. (https://github.com/echen102/COVID-19 TweetIDs/pull/2/commits/7d16ff3f29acf15af88c0d27424041b711865be3).

 But when I tried to add the condition you used to get geolocation data, it gives me error for invalid syntax.

It would be nice if you can share which twarc code you used so that I can edit the variable names properly.

You have done great work!

Submitted by WonSeok Kim on Sat, 05/09/2020 - 15:17

Hey Kim. I think you meant using twarc (https://github.com/DocNow/twarc). That was just a pseudo-code which I had mentioned in the abstract (I've now replaced it with an excerpt of the real code to avoid confusion). 

It does not matter how you are getting your JSON archived. Just make sure to add the following "if clause" in whatever way you're trying to pull the tweets. The "if clause" below will only be TRUE if the tweet contains an exact pin location.

data = json.loads(data)

if data["coordinates"]:

       longitude, latitude = data["coordinates"]["coordinates"]

Now you can store the longitude and latitude values as per your convenience. I hope this helps!

Submitted by Rabindra Lamsal on Sun, 05/24/2020 - 12:53

hey i want to download full data not only id , how can i do so please give response

 

Submitted by charu v on Wed, 05/20/2020 - 13:46

Hello Charu. Twitter's data sharing policy does not allow anyone to share tweet information other than tweet ID and/or user ID. The list of IDs should be hydrated to re-create a full fresh tweet dataset. For this purpose, you can use applications such as DocNow's Hydrator or QCRI's Tweets Downloader.

Submitted by Rabindra Lamsal on Fri, 05/29/2020 - 22:28

Thanks for the data. I am not sure if this is just at my end but the csv files have issue with the tweet ID fields due to its 15 digit limit. The values are different from the one in json. Maybe export them to .txt files rather than .csv.

Submitted by Abhay Singh on Tue, 06/02/2020 - 21:38

Hello Abhay. Yes, I have heard from a couple of people about getting the tweet IDs fixed on their machines. That is why I am also uploading the JSON for those experiencing this issue.

Can you confirm if the IDs are fixed even when opened using some text editors (Notepad or Sublime)? I think you're opening the CSV files with MS Excel. I've seen multiple posts regarding Excel, at Stack Exchange, truncating the digits after 15.

Submitted by Rabindra Lamsal on Tue, 06/02/2020 - 22:18

Hello Rabindra,

 

No it doesnt happen if you open the dataset using some other editor. Reading the data in different systems (R/Python) leads to different results as may not convert it properly. Also, if someone is using hydrate app and converts the csv to txt with just the IDs then it will have errors. Anyway, its fairly straight forward to convert the json to txt containing IDs but some users may benefit with just .txt files.

 

Cheers

Submitted by Abhay Singh on Tue, 06/02/2020 - 22:43

Thanks for getting back.

If you use the DocNow's hydrator app you can straightway import the downloaded CSV file for the hydrating purpose (while removing the sentiment column). However, QCRI's Tweets Downloader requires a TXT file (with a single tweet ID per line). So you'll have to play around the CSV file, to some extent, for the task to be done.

I have been reached by a very handful of people having an issue similar to this. Most of them were opening the CSV files with MS Excel to remove the sentiment column. The problem was not even there when the downloaded CSV was imported as a pandas data frame and the sentiment column was dropped, and the final data frame was exported as a CSV file ready to be hydrated.

Submitted by Rabindra Lamsal on Wed, 06/03/2020 - 00:41

Thanks Rabindra. All good. As I said, its not that hard to deal with it. I mentioned it so that some one else in having a similar issue could benefit. Cheers.

Submitted by Abhay Singh on Wed, 06/03/2020 - 00:51

Roger-that.

Submitted by Rabindra Lamsal on Wed, 06/03/2020 - 11:34

Embed this dataset on another website

Copy and paste the HTML code below to embed your dataset:

Share via email or social media

Click the buttons below:

facebooktwittermailshare
[1] Rabindra Lamsal, "Coronavirus (COVID-19) Geo-tagged Tweets Dataset", IEEE Dataport, 2020. [Online]. Available: http://dx.doi.org/10.21227/fpsb-jz61. Accessed: Jul. 05, 2020.
@data{fpsb-jz61-20,
doi = {10.21227/fpsb-jz61},
url = {http://dx.doi.org/10.21227/fpsb-jz61},
author = {Rabindra Lamsal },
publisher = {IEEE Dataport},
title = {Coronavirus (COVID-19) Geo-tagged Tweets Dataset},
year = {2020} }
TY - DATA
T1 - Coronavirus (COVID-19) Geo-tagged Tweets Dataset
AU - Rabindra Lamsal
PY - 2020
PB - IEEE Dataport
UR - 10.21227/fpsb-jz61
ER -
Rabindra Lamsal. (2020). Coronavirus (COVID-19) Geo-tagged Tweets Dataset. IEEE Dataport. http://dx.doi.org/10.21227/fpsb-jz61
Rabindra Lamsal, 2020. Coronavirus (COVID-19) Geo-tagged Tweets Dataset. Available at: http://dx.doi.org/10.21227/fpsb-jz61.
Rabindra Lamsal. (2020). "Coronavirus (COVID-19) Geo-tagged Tweets Dataset." Web.
1. Rabindra Lamsal. Coronavirus (COVID-19) Geo-tagged Tweets Dataset [Internet]. IEEE Dataport; 2020. Available from : http://dx.doi.org/10.21227/fpsb-jz61
Rabindra Lamsal. "Coronavirus (COVID-19) Geo-tagged Tweets Dataset." doi: 10.21227/fpsb-jz61