Datasets
Open Access
Coronavirus (COVID-19) Geo-tagged Tweets Dataset
- Citation Author(s):
- Submitted by:
- Rabindra Lamsal
- Last updated:
- Thu, 01/28/2021 - 00:10
- DOI:
- 10.21227/fpsb-jz61
- Data Format:
- Links:
- License:
- Categories:
- Keywords:
-
Corona Tweets Dataset, COVID-19 Tweets Dataset, Corona Tweets, COVID-19 Tweets, Corona Twitter Sentiment, COVID-19 Twitter Sentiment, SARS-CoV-2 Tweets Dataset, SARS-CoV-2 Twitter Sentiment, Coronavirus English Tweets Dataset, COVID-19 English Tweets Dataset, Coronavirus Geotagged Tweets, COVID-19 Geotagged Tweets
Abstract
This dataset contains IDs and sentiment scores of the geo-tagged tweets related to the COVID-19 pandemic. The tweets are captured by an on-going project deployed at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs. The tweet IDs in this dataset belong to the tweets tweeted providing an exact location.
The paper associated with this dataset is available here: Design and analysis of a large-scale COVID-19 tweets dataset
-------------------------------------
Related datasets:
(a) Coronavirus (COVID-19) Tweets Sentiment Trend (Global)
(b) Tweets Originating from India During COVID-19 Lockdowns
-------------------------------------
Below is a quick overview of this dataset.
— Dataset name: GeoCOV19Tweets Dataset
— Number of tweets : 306,270 tweets
— Coverage : Global
— Language : English (EN)
— Dataset usage terms : By using this dataset, you agree to (i) use the content of this dataset and the data generated from the content of this dataset for non-commercial research only, (ii) remain in compliance with Twitter's Developer Policy and (iii) cite the following paper:
Lamsal, R. Design and analysis of a large-scale COVID-19 tweets dataset. Applied Intelligence (2020). https://doi.org/10.1007/s10489-020-02029-z
— Primary dataset : Coronavirus (COVID-19) Tweets Dataset (COV19Tweets Dataset)
— Dataset updates : Everyday
— Active keywords and hashtags: keywords.tsv
Please visit this page (primary dataset) for details regarding the collection date and time (and other notes) of each CSV file present in this dataset.
Dataset Files
march20_march21.csv (37.17 kB)
march21_march22.csv (28.82 kB)
march22_march23.csv (29.93 kB)
march23_march24.csv (30.59 kB)
march24_march25.csv (27.03 kB)
march25_march26.csv (25.52 kB)
march26_march27.csv (22.57 kB)
march27_march28.csv (24.48 kB)
march28_march29.csv (23.73 kB)
march30_march31.csv (15.50 kB)
march31_april1.csv (17.85 kB)
april1_april2.csv (17.32 kB)
april2_april3.csv (18.64 kB)
april3_april4.csv (17.00 kB)
april4_april5.csv (18.67 kB)
april5_april6.csv (19.31 kB)
april6_april7.csv (15.40 kB)
april7_april8.csv (16.61 kB)
april8_april9.csv (13.99 kB)
april9_april10.csv (14.16 kB)
april10_april11.csv (14.38 kB)
april11_april12.csv (15.59 kB)
april12_april13.csv (16.14 kB)
april13_april14.csv (15.85 kB)
april14_april15.csv (11.46 kB)
april15_april16.csv (9.46 kB)
april16_april17.csv (13.31 kB)
april17_april18.csv (10.64 kB)
april18_april19.csv (28.87 kB)
april19_april20.csv (25.26 kB)
april20_april21.csv (24.85 kB)
april21_april22.csv (25.15 kB)
april22_april23.csv (23.58 kB)
april23_april24.csv (23.25 kB)
april24_april25.csv (21.72 kB)
april25_april26.csv (20.12 kB)
april26_april27.csv (27.17 kB)
april27_april28.csv (21.17 kB)
april28_april29.csv (42.75 kB)
april29_april30.csv (51.94 kB)
april30_may1.csv (49.86 kB)
may1_may2.csv (56.21 kB)
may2_may3.csv (55.30 kB)
may3_may4.csv (46.73 kB)
may4_may5.csv (48.43 kB)
may5_may6.csv (49.12 kB)
may6_may7.csv (47.35 kB)
may7_may8.csv (49.16 kB)
may8_may9.csv (50.35 kB)
may9_may10.csv (44.36 kB)
may10_may11.csv (33.62 kB)
may11_may12.csv (38.86 kB)
may12_may13.csv (48.49 kB)
may13_may14.csv (44.26 kB)
may14-may15.csv (49.94 kB)
may15_may16.csv (47.65 kB)
may16_may17.csv (45.22 kB)
may17_may18.csv (39.42 kB)
may18_may19.csv (42.21 kB)
may19_may20.csv (41.82 kB)
may20_may21.csv (42.29 kB)
may21_may22.csv (47.11 kB)
may22_may23.csv (46.56 kB)
may23_may24.csv (39.11 kB)
may24_may25.csv (39.44 kB)
may25_may26.csv (35.96 kB)
may26_may27.csv (35.94 kB)
may27_may28.csv (37.93 kB)
may28_may29.csv (38.72 kB)
may29_may30.csv (36.32 kB)
may30_may31.csv (36.32 kB)
may31_june1.csv (33.00 kB)
june1_june2.csv (37.40 kB)
june2_june3.csv (27.77 kB)
june3_june4.csv (32.05 kB)
june4_june5.csv (33.84 kB)
june5_june6.csv (36.57 kB)
june6_june7.csv (33.38 kB)
june7_june8.csv (32.24 kB)
june8_june9.csv (37.07 kB)
june9_june10.csv (35.61 kB)
june10_june11.csv (34.69 kB)
june11_june12.csv (35.12 kB)
june12_june13.csv (36.68 kB)
june13_june14.csv (31.67 kB)
june14_june15.csv (31.71 kB)
june15_june16.csv (51.89 kB)
june16_june17.csv (52.57 kB)
june17_june18.csv (51.54 kB)
june18_june19.csv (50.74 kB)
june19_june20.csv (53.22 kB)
june20_june21.csv (50.95 kB)
june21_june22.csv (43.73 kB)
june22_june23.csv (47.78 kB)
june23_june24.csv (44.52 kB)
june24_june25.csv (45.55 kB)
june25_june26.csv (48.55 kB)
june26_june27.csv (43.75 kB)
june27_june28.csv (55.16 kB)
june28_june29.csv (44.44 kB)
june29_june30.csv (48.05 kB)
june30_july1.csv (46.93 kB)
july1_july2.csv (57.62 kB)
july2_july3.csv (45.22 kB)
july3_july4.csv (46.19 kB)
july4_july5.csv (63.11 kB)
july5_july6.csv (47.22 kB)
july6_july7.csv (40.01 kB)
july7_july8.csv (41.92 kB)
july8_july9.csv (43.86 kB)
july9_july10.csv (56.86 kB)
july10_july11.csv (49.40 kB)
july11_july12.csv (48.93 kB)
july12_july13.csv (42.02 kB)
july13_july14.csv (40.06 kB)
july14_july15.csv (42.60 kB)
july15_july16.csv (37.52 kB)
july16_july17.csv (41.60 kB)
july17_july18.csv (48.44 kB)
july18_july19.csv (50.76 kB)
july19_july20.csv (44.84 kB)
july20_july21.csv (41.77 kB)
july21_july22.csv (48.36 kB)
july22_july23.csv (48.32 kB)
july23_july24.csv (48.26 kB)
july24_july25.csv (52.09 kB)
july25_july26.csv (47.57 kB)
july26_july27.csv (44.38 kB)
july27_july28.csv (39.81 kB)
july28_july29.csv (36.83 kB)
july29_july30.csv (40.30 kB)
july30_july31.csv (37.23 kB)
july31_august1.csv (43.96 kB)
august1_august2.csv (45.13 kB)
august2_august3.csv (38.84 kB)
august3_august4.csv (42.76 kB)
august4_august5.csv (46.55 kB)
august5_august6.csv (48.91 kB)
august6_august7.csv (49.87 kB)
august7_august8.csv (50.64 kB)
august8_august9.csv (50.74 kB)
august9_august10.csv (44.21 kB)
august10_august11.csv (42.45 kB)
august11_august12.csv (46.96 kB)
august12_august13.csv (50.69 kB)
august13_august14.csv (51.99 kB)
august14_august15.csv (52.32 kB)
august15_august16.csv (53.46 kB)
august16_august17.csv (47.08 kB)
august17_august18.csv (49.32 kB)
august18_august19.csv (60.15 kB)
august19_august20.csv (48.29 kB)
august20_august21.csv (52.57 kB)
august21_august22.csv (50.00 kB)
august22_august23.csv (44.56 kB)
august23_august24.csv (39.62 kB)
august24_august25.csv (43.23 kB)
august25_august26.csv (47.62 kB)
august26_august27.csv (47.66 kB)
august27_august28.csv (45.17 kB)
august28_august29.csv (42.85 kB)
august29_august30.csv (44.47 kB)
august30_august31.csv (37.18 kB)
august31_september1.csv (40.82 kB)
september1_september2.csv (45.97 kB)
september2_september3.csv (41.58 kB)
september3_september4.csv (41.24 kB)
september4_september5.csv (49.29 kB)
september5_september6.csv (42.38 kB)
september6_september7.csv (40.01 kB)
september7_september8.csv (41.11 kB)
september8_september9.csv (43.36 kB)
september9_september10.csv (35.43 kB)
september10_september11.csv (19.14 kB)
september11_september12.csv (19.89 kB)
september12_september13.csv (20.44 kB)
september13_september14.csv (19.65 kB)
september14_september15.csv (19.31 kB)
september15_september16.csv (20.79 kB)
september16_september17.csv (18.95 kB)
september17_september18.csv (17.44 kB)
september18_september19.csv (21.69 kB)
september19_september20.csv (20.91 kB)
september20_september21.csv (17.76 kB)
september21_september22.csv (19.72 kB)
september22_september23.csv (18.96 kB)
september23_september24.csv (17.97 kB)
september24_september25.csv (19.33 kB)
september25_september26.csv (19.36 kB)
september26_september27.csv (19.81 kB)
september27_september28.csv (17.87 kB)
september28_september29.csv (19.27 kB)
september29_september30.csv (18.52 kB)
september30_october1.csv (17.52 kB)
october1_october2.csv (19.34 kB)
october2_october3.csv (9.42 kB)
october3_october4.csv (11.54 kB)
october4_october5.csv (11.48 kB)
october5_october6.csv (10.70 kB)
october6_october7.csv (11.95 kB)
october7_october8.csv (13.67 kB)
october8_october9.csv (16.11 kB)
october9_october10.csv (17.09 kB)
october10_october11.csv (17.41 kB)
october11_october12.csv (15.14 kB)
october12_october13.csv (16.63 kB)
october13_october14.csv (14.67 kB)
october14_october15.csv (18.58 kB)
october15_october16.csv (15.55 kB)
october16_october17.csv (17.74 kB)
october17_october18.csv (18.90 kB)
october18_october19.csv (15.12 kB)
october19_october20.csv (17.85 kB)
october20_october21.csv (17.80 kB)
october21_october22.csv (17.62 kB)
october22_october23.csv (17.59 kB)
october23_october24.csv (20.85 kB)
october24_october25.csv (19.28 kB)
october25_october26.csv (15.16 kB)
october26_october27.csv (17.13 kB)
october27_october28.csv (9.47 kB)
october28_october29.csv (15.67 kB)
october29_october30.csv (20.54 kB)
october30_october31.csv (20.80 kB)
october31_november1.csv (31.61 kB)
november1_november2.csv (23.10 kB)
november2_november3.csv (20.69 kB)
november3_november4.csv (22.39 kB)
november4_november5.csv (24.90 kB)
november5_november6.csv (27.52 kB)
november6_november7.csv (21.58 kB)
november7_november8.csv (20.38 kB)
november8_november9.csv (18.23 kB)
november9_november10.csv (13.81 kB)
november10_november11.csv (18.54 kB)
november11_november12.csv (19.01 kB)
november12_november13.csv (17.82 kB)
november13_november14.csv (18.22 kB)
november14_november15.csv (20.35 kB)
november15_november16.csv (18.87 kB)
november16_november17.csv (15.40 kB)
november17_november18.csv (17.27 kB)
november18_november19.csv (19.06 kB)
november19_november20.csv (19.84 kB)
november20_november21.csv (19.62 kB)
november21_november22.csv (20.69 kB)
november22_november23.csv (18.96 kB)
november23_november24.csv (17.40 kB)
november24_november25.csv (19.01 kB)
november25_november26.csv (20.01 kB)
november26_november27.csv (24.68 kB)
november27_november28.csv (17.91 kB)
november28_november29.csv (18.49 kB)
november29_november30.csv (16.77 kB)
november30_december1.csv (16.46 kB)
december1_december2.csv (18.38 kB)
december2_december3.csv (18.09 kB)
december3_december4.csv (18.34 kB)
december4_december5.csv (16.99 kB)
december5_december6.csv (18.73 kB)
december6_december7.csv (12.57 kB)
december7_december8.csv (19.09 kB)
december8_december9.csv (15.55 kB)
december9_december10.csv (15.94 kB)
december10_december11.csv (15.20 kB)
december11_december12.csv (18.73 kB)
december12_december13.csv (20.41 kB)
december13_december14.csv (15.60 kB)
december14_december15.csv (15.63 kB)
december15_december16.csv (18.65 kB)
december16_december17.csv (16.27 kB)
december17_december18.csv (16.92 kB)
december18_december19.csv (17.06 kB)
december19_december20.csv (13.73 kB)
december20_december21.csv (17.22 kB)
december21_december22.csv (15.07 kB)
december22_december23.csv (15.86 kB)
december23_december24.csv (17.99 kB)
december24_december25.csv (22.07 kB)
december25_december26.csv (19.77 kB)
december26_december27.csv (15.47 kB)
december27_december28.csv (12.44 kB)
december28_december29.csv (15.51 kB)
december29_december30.csv (16.48 kB)
december30_december31.csv (16.25 kB)
december31_january1.csv (27.33 kB)
january1_january2.csv (18.05 kB)
january2_january3.csv (13.33 kB)
january3_january4.csv (12.29 kB)
january4_january5.csv (13.69 kB)
january5_january6.csv (19.53 kB)
january6_january7.csv (16.82 kB)
january7_january8.csv (19.08 kB)
january8_january9.csv (18.29 kB)
january9_january10.csv (19.60 kB)
january10_january11.csv (15.68 kB)
january11_january12.csv (16.26 kB)
january12_january13.csv (18.55 kB)
january13_january14.csv (16.30 kB)
january14_january15.csv (20.05 kB)
january15_january16.csv (17.76 kB)
january16_january17.csv (15.38 kB)
january17_january18.csv (16.42 kB)
january18_january19.csv (15.57 kB)
january19_january20.csv (17.83 kB)
january20_january21.csv (14.52 kB)
january21_january22.csv (15.39 kB)
january22_january23.csv (16.63 kB)
january23_january24.csv (16.59 kB)
january24_january25.csv (15.69 kB)
january25_january26.csv (14.83 kB)
january26_january27.csv (16.94 kB)
january27_january28.csv (16.62 kB)
Comments
Great Work!
Thanks, sadiksha!
Thank you very much for providing this dataset and your support
My pleasure, Hanaa!
I created an ieee account just to download this dataset. There are numerous tweets datasets currently floating around but did not have particularly the list of tweets ids that had pin location. Thanks for your efforts.
Thanks, Curran! I am glad that you found the dataset useful.
How to get tweet text and location from this csv data? Also tweet ID is not completely full, last 5 digits are only 0s
(i) Please go through the paper associated with this dataset.
(ii) That's because you must be trying to view the CSV file using MS Excel. MS Excel does not handle digits up to 19th precision. That is the reason why you're seeing last 4 digits being replaced by zeros. Please use Google Sheets to make the CSV files ready for hydration.
Hi, I hydrated IDS file using twarc. (https://github.com/echen102/COVID-19 TweetIDs/pull/2/commits/7d16ff3f29acf15af88c0d27424041b711865be3).
But when I tried to add the condition you used to get geolocation data, it gives me error for invalid syntax.
It would be nice if you can share which twarc code you used so that I can edit the variable names properly.
You have done great work!
Hey Kim. I think you meant using twarc (https://github.com/DocNow/twarc). That was just a pseudo-code which I had mentioned in the abstract (I've now replaced it with an excerpt of the real code to avoid confusion).
It does not matter how you are getting your JSON archived. Just make sure to add the following "if clause" in whatever way you're trying to pull the tweets. The "if clause" below will only be TRUE if the tweet contains an exact pin location.
data = json.loads(data)
if data["coordinates"]:
longitude, latitude = data["coordinates"]["coordinates"]
Now you can store the longitude and latitude values as per your convenience. I hope this helps!
hey i want to download full data not only id , how can i do so please give response
Hello Charu. Twitter's data sharing policy does not allow anyone to share tweet information other than tweet ID and/or user ID. The list of IDs should be hydrated to re-create a full fresh tweet dataset. For this purpose, you can use applications such as DocNow's Hydrator or QCRI's Tweets Downloader.
Thanks for the data. I am not sure if this is just at my end but the csv files have issue with the tweet ID fields due to its 15 digit limit. The values are different from the one in json. Maybe export them to .txt files rather than .csv.
Hello Abhay. Yes, I have heard from a couple of people about getting the tweet IDs fixed on their machines. That is why I am also uploading the JSON for those experiencing this issue.
Can you confirm if the IDs are fixed even when opened using some text editors (Notepad or Sublime)? I think you're opening the CSV files with MS Excel. I've seen multiple posts regarding Excel, at Stack Exchange, truncating the digits after 15.
Hello Rabindra,
No it doesnt happen if you open the dataset using some other editor. Reading the data in different systems (R/Python) leads to different results as may not convert it properly. Also, if someone is using hydrate app and converts the csv to txt with just the IDs then it will have errors. Anyway, its fairly straight forward to convert the json to txt containing IDs but some users may benefit with just .txt files.
Cheers
Thanks for getting back.
If you use the DocNow's hydrator app you can straightway import the downloaded CSV file for the hydrating purpose (while removing the sentiment column). However, QCRI's Tweets Downloader requires a TXT file (with a single tweet ID per line). So you'll have to play around the CSV file, to some extent, for the task to be done.
I have been reached by a very handful of people having an issue similar to this. Most of them were opening the CSV files with MS Excel to remove the sentiment column. The problem was not even there when the downloaded CSV was imported as a pandas data frame and the sentiment column was dropped, and the final data frame was exported as a CSV file ready to be hydrated.
Thanks Rabindra. All good. As I said, its not that hard to deal with it. I mentioned it so that some one else in having a similar issue could benefit. Cheers.
Roger-that.
I need 2000 twitter messages relevant COVID-19 for my course work. where I need to get the distribution of these tweets in world map. Can someone help me to get the twitter messages.
[updated on August 7, 2020] Hello Gayathri. You'll have to hydrate the tweet IDs provided in this dataset to get your work done. I'd suggest you use twarc for this purpose. I am guessing you'll only need the tweet and geo-location for your work.
#import libraries
from twarc import Twarc
import sqlite3
#create a database
connection = sqlite3.connect('database.db')
c = connection.cursor()
#creating a table
def table():
try:
c.execute("CREATE TABLE IF NOT EXISTS geo_map(tweet TEXT, longitude REAL, latitude REAL)")
connection.commit()
except Exception as e:
print(str(e))
table()
#Initializing Twitter API keys
consumer_key=""
consumer_secret=""
access_token=""
access_token_secret=""
t = Twarc(consumer_key, consumer_secret, access_token, access_token_secret)
#hydrating the tweet IDs
for tweet in t.hydrate(open('ready_july5_july6.csv')):
text = tweet["full_text"]
longitude, latitude = tweet["coordinates"]["coordinates"]
c.execute("INSERT INTO geo_map (tweet, longitude, latitude) VALUES (?, ?, ?)", (text, longitude, latitude))
connection.commit()
Now you can simply make a connection to the above database-table to read its contents and plot the tweets using libraries such as Plotly. I hope this helps. Good luck!
If I am to filter out the tweets from the geotagged ones how can I do that? I have a tweet id dataset which has tweets from before march 20. I only want to filter the geotagged tweets from other tweets. And amazing work you have done here with the two datasets having daily files. Thanks.
Hello Mohit. Filtering geo-tagged tweets from the rest is quite straightforward if you use twarc for hydrating the tweet IDs. You'll have to add a condition to the "coordinates" Twitter object.
for tweet in t.hydrate(open('/path/to/tweet/file.csv')):
if tweet["coordinates"]:
#now you can extract whichever information you want
longitude, latitude = tweet["coordinates"]["coordinates"] #for getting geo-coordinates
You can go-through the code snippet replied to the comment thread just above this one for getting a headstart with storing the extracted information to a database.
Thank you for instant reply. May I ask which database you use in your project running at live.rlamsal.com.np?
The project uses SQLite.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
in
1 for tweet in t.hydrate(open("id_nov8_nov9.csv")):
----> 2 "36.7783, 119.4179" == tweet["coordinates"]["coordinated"]
KeyError: 'coordinated'
Sir, Iam getting this error for this code. Could you please help me.....
Hello Gongati. I see that you're trying to get the coordinates. The correct form would be: tweet["coordinates"]["coordinates"] for extracting coordinates info from the JSON.
hey, sorry if i'm being dense but i can't find the json files?
Hello Lucas. The JSON files were initially present in this dataset and were lately removed as they seemed redundant. The JSON files also included the same content that the CSV files had.
I have downloaded the data...what is the total number of rows in all the datasets taken togather.
There are more than 140k tweet IDs in the dataset together.
It appears to be just a few thousand rows in all the datasets taken togather.
Yes, there are 140k get-tagged tweets in this dataset. These are the tweets that have "point" location information. If you are okay with having a boundary location instead, you'll have to hydrate the tweets in this dataset (https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-dataset) and consider conditioning the ["place"] twitter object. The Coronavirus (COVID-19) Tweets Dataset has more than 310 million tweets, and I guess you'll be able to come up with a few million of tweets with the boundary condition enabled.
The geo tagging is from India alone?
No. This is a global dataset.
Thanks. I was looking for day by day geo data.
Glad to be of help.
Thank you a lot for the dataset!
I'm trying to hydrate the tweets for 7.26 but it seems too slow since there are over 3 million tweets. Is there some faster way to hydrate them?
Hello Danqing. Twitter has rate limits for its APIs. Both the hydrator app and twarc handle the rate limits and pull the JSON accordingly. If you're searching for some way to get the hydration process to expedite, I'd recommend involving some other person who has access to the Twitter Devs, and you can ask him/her to hydrate a portion of the IDs.
How to filter the tweets according to a particular country? For e.g India
Hello Trupti. Just to give you a headstart: If I were you, I would play around the location-specific Twitter Objects at three different levels. First, I would check if the tweet is geo-tagged (if it contains an exact location). Secondly, if the tweet is not geo-tagged, chances are that it might have a region or a country boundary box defined. Third, if none of the criteria satisfy, I would simply try to extract location information from the user's profile.
Here's an example of using twarc as a python library for this purpose.
from twarc import Twarc
consumer_key=""
consumer_secret=""
access_token=""
access_token_secret=""
t = Twarc(consumer_key, consumer_secret, access_token, access_token_secret)
for tweet in t.hydrate(open('tweet_ids.txt')):
if tweet["coordinates"]:
loc = tweet[‘‘place"]["country"] #place based on the "point" location
'''check the value in "loc" if it is from a country of your interest'''
'''however do check if tweet["place"] is of NoneType. In that condition get the long, lat from tweet["coordinates"]["coordinates"] and convert it to human readable format.
elif tweet["place"]:
loc = tweet[‘‘place"]["country"] #bounding box region
'''check the value in "loc" if it is from a country of your interest'''
else:
loc_profile = tweet["user"]["location"] #location from profile
'''check the value in "loc_profile" if it is from a country of your interest'''
However, this dataset contains the geo-tagged tweets IDs. I'd suggest you to use the Coronavirus (COVID-19) Tweets Dataset, that contains more than 386 million tweet IDs. Applying these geo specific conditions on that dataset would help you extract more tweets for your work. I hope this helps.
I have tried different codes, and got different errors, could you please help me..
KeyError Traceback (most recent call last)
in
1 for tweet in t.hydrate(open("ID_nov8_nov9.txt")):
2 if tweet["place"]:
----> 3 loc=tweet["place"]["USA"]
KeyError: 'USA'
for tweet in t.hydrate(open("ID_nov8_nov9.txt")):
if tweet["place"]:
"USA"== tweet["place"]["country"]
This code has no result, no error just executed.
for tweet in t.hydrate(open("ID_nov8_nov9.txt")):
if tweet["place"]:
India = tweet["place"]["country"]
This code has no result, no error just executed.
Looks like you want to extract tweets tweeted from the United States. For that use case, you can have this simple condition in your code:
t = Twarc(consumer_key, consumer_secret, access_token, access_token_secret)
for tweet in t.hydrate(open('ids.csv')):
if tweet["place"]:
if tweet['place']['country'] == "United States":
store the tweet data
Forgive me, the comment box is not allowing me to add proper indentation in the code. I hope this helps.
Great work!
Which API do you use - twitter search api or twitter streaming api? Does the data includes retweet?
Thanks, Antony. It's streaming API. Retweets have NULL geo and place objects. Therefore, retweets won't be making their way to this dataset. However, Quote tweets are included as they can have their own geo and place objects.
Hi, what algorithm are you using to calculate the sentiment scores, e.g. vader? Thank you!
Hello Molu. The TextBlob library has been used to compute the sentiment scores.
How do I use It?? Can you share the tweets and the sentiment label, so that I can use it in training my model
Please refer to my previous comments.
hi thank you sooo much for the tremendous work. I appreciated it very much. two quick questions: how do you know
whether those tweets are by robots. Have you applied any filtering techniques?
Second, if I would like to replicate your data collection from twitter myself, could you share your code regarding how to collect geo-tagged tweets.
Hello Yang. Glad to know that you found this dataset useful.
(i) To curate this dataset, the real-time Twitter stream is filtered by tracking 90+ COVID-19 specific keywords (view the attached keywords.tsv file). All the tweets received from the stream make their way to the primary dataset. The primary dataset can be considered a comprehensive collection for all kinds of analyses (sentiment, geo, fact check, trend, etc.).
(ii) Please refer to my previous comments: https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tw... and https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tw...
Pages