Coronavirus (COVID-19) Tweets Dataset

Coronavirus (COVID-19) Tweets Dataset

Citation Author(s):
Rabindra
Lamsal
School of Computer and Systems Sciences, JNU
Submitted by:
Rabindra Lamsal
Last updated:
Tue, 07/14/2020 - 07:21
DOI:
10.21227/781w-ef42
Data Format:
Links:
License:
Dataset Views:
60610
Rating:
4.9
10 ratings - Please login to submit your rating.
Share / Embed Cite

This dataset includes CSV files that contain tweet IDs. The tweets have been collected by an on-going project deployed at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for coronavirus-related tweets, using filters: language “english”, and keywords “corona”, "coronavirus", "covid", "pandemic", "lockdown", "quarantine", "hand sanitizer", "ppe", "n95", different possible variants of "sarscov2", "nCov", "covid-19", "ncov2019", "2019ncov", "flatten(ing) the curve", "social distancing", "work(ing) from home" and the respective hashtag of all these keywords. This dataset has been completely re-designed on March 20, 2020, to comply with the content redistribution policy set by Twitter.

-------------------------------------------------------------------------

Tweets count: 298,822,322 — Global (English)

-------------------------------------------------------------------------

Coronavirus (COVID-19) Geo-tagged Tweets Dataset

(Tweets with location information; ongoing collection)

-------------------------------------------------------------------------

Why are only tweet IDs being shared? Twitter's content redistribution policy restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers always to pull fresh data. It is because a user might delete a tweet or make their profile protected. If the same tweet has already been pulled and shared on a public domain, it might make the user/community vulnerable to many inferences coming out of the shared data which currently does not exist or is private.

Do you have tweets collected before March 20, 2020? Unfortunately, I had to unpublish more than 20 million tweets collected between Jan 27, 2020, and March 20, 2020, because the collection did not have tweet IDs obtained. "Why?" you might ask. Initially, the primary objective of the deployed model was not just to collect the tweets; it was more like an optimization project aiming to study how much information, received in a near-real-time scenario, can be processed with minimal computing resources at hand. However, when the COVID-19 outbreak started becoming a global emergency, I decided to release the collected tweets rather than just keeping them with me. But, the collection did not have tweet IDs. Which is why a fresh collection was started after March 20, 2020.

Note: This dataset should be solely used for non-commercial research purposes (ignore every other LICENSE category given on this page). A new list of tweet IDs will be added to this dataset every day. Bookmark this page for further updates.

(Tweets collected in UTC; Local time mentioned below: GMT+5:45):

corona_tweets_01.csv + corona_tweets_02.csv + corona_tweets_03.csv: 2,475,980 tweets (March 20, 2020 01:37 AM - March 21, 2020 09:25 AM)

corona_tweets_04.csv: 1,233,340 tweets (March 21, 2020 09:27 AM - March 22, 2020 07:46 AM)

corona_tweets_05.csv: 1,782,157 tweets (March 22, 2020 07:50 AM - March 23, 2020 09:08 AM)

corona_tweets_06.csv: 1,771,295 tweets (March 23, 2020 09:11 AM - March 24, 2020 11:35 AM)

corona_tweets_07.csv: 1,479,651 tweets (March 24, 2020 11:42 AM - March 25, 2020 11:43 AM)

corona_tweets_08.csv: 1,272,592 tweets (March 25, 2020 11:47 AM - March 26, 2020 12:46 PM)

corona_tweets_09.csv: 1,091,429 tweets (March 26, 2020 12:51 PM - March 27, 2020 11:53 AM)

corona_tweets_10.csv: 1,172,013 tweets (March 27, 2020 11:56 AM - March 28, 2020 01:59 PM)

corona_tweets_11.csv: 1,141,210 tweets (March 28, 2020 02:03 PM - March 29, 2020 04:01 PM)

> March 29, 2020 04:05 PM - March 30, 2020 02:00 PM -- Some technical fault has occurred. Preventive measures have been taken. Tweets for this session won't be available.

corona_tweets_12.csv: 793,417 tweets (March 30, 2020 02:01 PM - March 31, 2020 10:16 AM)

corona_tweets_13.csv: 1,029,294 tweets (March 31, 2020 10:20 AM - April 01, 2020 10:59 AM)

corona_tweets_14.csv: 920,076 tweets (April 01, 2020 11:02 AM - April 02, 2020 12:19 PM)

corona_tweets_15.csv: 826,271 tweets (April 02, 2020 12:21 PM - April 03, 2020 02:38 PM)

corona_tweets_16.csv: 612,512 tweets (April 03, 2020 02:40 PM - April 04, 2020 11:54 AM)

corona_tweets_17.csv: 685,560 tweets (April 04, 2020 11:56 AM - April 05, 2020 12:54 PM)

corona_tweets_18.csv: 717,301 tweets (April 05, 2020 12:56 PM - April 06, 2020 10:57 AM)

corona_tweets_19.csv: 722,921 tweets (April 06, 2020 10:58 AM - April 07, 2020 12:28 PM)

corona_tweets_20.csv: 554,012 tweets (April 07, 2020 12:29 PM - April 08, 2020 12:34 PM)

corona_tweets_21.csv: 589,679 tweets (April 08, 2020 12:37 PM - April 09, 2020 12:18 PM)

corona_tweets_22.csv: 517,718 tweets (April 09, 2020 12:20 PM - April 10, 2020 09:20 AM)

corona_tweets_23.csv: 601,199 tweets (April 10, 2020 09:22 AM - April 11, 2020 10:22 AM)

corona_tweets_24.csv: 497,655 tweets (April 11, 2020 10:24 AM - April 12, 2020 10:53 AM)

corona_tweets_25.csv: 477,182 tweets (April 12, 2020 10:57 AM - April 13, 2020 11:43 AM)

corona_tweets_26.csv: 288,277 tweets (April 13, 2020 11:46 AM - April 14, 2020 12:49 AM)

corona_tweets_27.csv: 515,739 tweets (April 14, 2020 11:09 AM - April 15, 2020 12:38 PM)

corona_tweets_28.csv: 427,088 tweets (April 15, 2020 12:40 PM - April 16, 2020 10:03 AM)

corona_tweets_29.csv: 433,368 tweets (April 16, 2020 10:04 AM - April 17, 2020 10:38 AM)

corona_tweets_30.csv: 392,847 tweets (April 17, 2020 10:40 AM - April 18, 2020 10:17 AM)

> With the addition of some more coronavirus specific keywords, the number of tweets captured day has increased significantly, therefore, the CSV files hereafter will be zipped. Lets save some bandwidth.

corona_tweets_31.csv: 2,671,818 tweets (April 18, 2020 10:19 AM - April 19, 2020 09:34 AM)

corona_tweets_32.csv: 2,393,006 tweets (April 19, 2020 09:43 AM - April 20, 2020 10:45 AM)

corona_tweets_33.csv: 2,227,579 tweets (April 20, 2020 10:56 AM - April 21, 2020 10:47 AM)

corona_tweets_34.csv: 2,211,689 tweets (April 21, 2020 10:54 AM - April 22, 2020 10:33 AM)

corona_tweets_35.csv: 2,265,189 tweets (April 22, 2020 10:45 AM - April 23, 2020 10:49 AM)

corona_tweets_36.csv: 2,201,138 tweets (April 23, 2020 11:08 AM - April 24, 2020 10:39 AM)

corona_tweets_37.csv: 2,338,713 tweets (April 24, 2020 10:51 AM - April 25, 2020 11:50 AM)

corona_tweets_38.csv: 1,981,835 tweets (April 25, 2020 12:20 PM - April 26, 2020 09:13 AM)

corona_tweets_39.csv: 2,348,827 tweets (April 26, 2020 09:16 AM - April 27, 2020 10:21 AM)

corona_tweets_40.csv: 2,212,216 tweets (April 27, 2020 10:33 AM - April 28, 2020 10:09 AM)

corona_tweets_41.csv: 2,118,853 tweets (April 28, 2020 10:20 AM - April 29, 2020 08:48 AM)

corona_tweets_42.csv: 2,390,703 tweets (April 29, 2020 09:09 AM - April 30, 2020 10:33 AM)

corona_tweets_43.csv: 2,184,439 tweets (April 30, 2020 10:53 AM - May 01, 2020 10:18 AM)

corona_tweets_44.csv: 2,223,013 tweets (May 01, 2020 10:23 AM - May 02, 2020 09:54 AM)

corona_tweets_45.csv: 2,216,553 tweets (May 02, 2020 10:18 AM - May 03, 2020 09:57 AM)

corona_tweets_46.csv: 2,266,373 tweets (May 03, 2020 10:09 AM - May 04, 2020 10:17 AM)

corona_tweets_47.csv: 2,227,489 tweets (May 04, 2020 10:32 AM - May 05, 2020 10:17 AM)

corona_tweets_48.csv: 2,218,774 tweets (May 05, 2020 10:38 AM - May 06, 2020 10:26 AM)

corona_tweets_49.csv: 2,164,251 tweets (May 06, 2020 10:35 AM - May 07, 2020 09:33 AM)

corona_tweets_50.csv: 2,203,686 tweets (May 07, 2020 09:55 AM - May 08, 2020 09:35 AM)

corona_tweets_51.csv: 2,250,019 tweets (May 08, 2020 09:39 AM - May 09, 2020 09:49 AM)

corona_tweets_52.csv: 2,273,705 tweets (May 09, 2020 09:55 AM - May 10, 2020 10:11 AM)

corona_tweets_53.csv: 2,208,264 tweets (May 10, 2020 10:23 AM - May 11, 2020 09:57 AM)

corona_tweets_54.csv: 2,216,845 tweets (May 11, 2020 10:08 AM - May 12, 2020 09:52 AM)

corona_tweets_55.csv: 2,264,472 tweets (May 12, 2020 09:59 AM - May 13, 2020 10:14 AM)

corona_tweets_56.csv: 2,339,709 tweets (May 13, 2020 10:24 AM - May 14, 2020 11:21 AM)

corona_tweets_57.csv: 2,096,878 tweets (May 14, 2020 11:38 AM - May 15, 2020 09:58 AM)

corona_tweets_58.csv: 2,214,205 tweets (May 15, 2020 10:13 AM - May 16, 2020 09:43 AM)

> The server and the databases have been optimized; therefore, there is a significant rise in the number of tweets captured per day.

corona_tweets_59.csv: 3,389,090 tweets (May 16, 2020 09:58 AM - May 17, 2020 10:34 AM)

corona_tweets_60.csv: 3,530,933 tweets (May 17, 2020 10:36 AM - May 18, 2020 10:07 AM)

corona_tweets_61.csv: 3,899,631 tweets (May 18, 2020 10:08 AM - May 19, 2020 10:07 AM)

corona_tweets_62.csv: 3,767,009 tweets (May 19, 2020 10:08 AM - May 20, 2020 10:06 AM)

corona_tweets_63.csv: 3,790,455 tweets (May 20, 2020 10:06 AM - May 21, 2020 10:15 AM)

corona_tweets_64.csv: 3,582,020 tweets (May 21, 2020 10:16 AM - May 22, 2020 10:13 AM)

corona_tweets_65.csv: 3,461,470 tweets (May 22, 2020 10:14 AM - May 23, 2020 10:08 AM)

corona_tweets_66.csv: 3,477,564 tweets (May 23, 2020 10:08 AM - May 24, 2020 10:02 AM)

corona_tweets_67.csv: 3,656,446 tweets (May 24, 2020 10:02 AM - May 25, 2020 10:10 AM)

corona_tweets_68.csv: 3,474,952 tweets (May 25, 2020 10:11 AM - May 26, 2020 10:22 AM)

corona_tweets_69.csv: 3,422,960 tweets (May 26, 2020 10:22 AM - May 27, 2020 10:16 AM)

corona_tweets_70.csv: 3,480,999 tweets (May 27, 2020 10:17 AM - May 28, 2020 10:35 AM)

corona_tweets_71.csv: 3,446,008 tweets (May 28, 2020 10:36 AM - May 29, 2020 10:07 AM)

corona_tweets_72.csv: 3,492,841 tweets (May 29, 2020 10:07 AM - May 30, 2020 10:14 AM)

corona_tweets_73.csv: 3,098,817 tweets (May 30, 2020 10:15 AM - May 31, 2020 10:13 AM)

corona_tweets_74.csv: 3,234,848 tweets (May 31, 2020 10:13 AM - June 01, 2020 10:14 AM)

corona_tweets_75.csv: 3,206,132 tweets (June 01, 2020 10:15 AM - June 02, 2020 10:07 AM)

corona_tweets_76.csv: 3,206,417 tweets (June 02, 2020 10:08 AM - June 03, 2020 10:26 AM)

corona_tweets_77.csv: 3,256,225 tweets (June 03, 2020 10:27 AM - June 04, 2020 10:23 AM)

corona_tweets_78.csv: 2,205,123 tweets (June 04, 2020 10:26 AM - June 05, 2020 10:03 AM) (tweet IDs were extracted from the backup server for this session)

corona_tweets_79.csv: 3,381,184 tweets (June 05, 2020 10:11 AM - June 06, 2020 10:16 AM)

corona_tweets_80.csv: 3,194,500 tweets (June 06, 2020 10:17 AM - June 07, 2020 10:24 AM)

corona_tweets_81.csv: 2,768,780 tweets (June 07, 2020 10:25 AM - June 08, 2020 10:13 AM)

corona_tweets_82.csv: 3,032,227 tweets (June 08, 2020 10:13 AM - June 09, 2020 10:12 AM)

corona_tweets_83.csv: 2,984,970 tweets (June 09, 2020 10:12 AM - June 10, 2020 10:13 AM)

corona_tweets_84.csv: 3,068,002 tweets (June 10, 2020 10:14 AM - June 11, 2020 10:11 AM)

corona_tweets_85.csv: 3,261,215 tweets (June 11, 2020 10:12 AM - June 12, 2020 10:10 AM)

corona_tweets_86.csv: 3,378,901 tweets (June 12, 2020 10:11 AM - June 13, 2020 10:10 AM)

corona_tweets_87.csv: 3,011,103 tweets (June 13, 2020 10:11 AM - June 14, 2020 10:08 AM)

corona_tweets_88.csv: 3,154,328 tweets (June 14, 2020 10:09 AM - June 15, 2020 10:10 AM)

corona_tweets_89.csv: 3,837,552 tweets (June 15, 2020 10:10 AM - June 16, 2020 10:10 AM)

corona_tweets_90.csv: 3,889,262 tweets (June 16, 2020 10:11 AM - June 17, 2020 10:10 AM)

corona_tweets_91.csv: 3,688,348 tweets (June 17, 2020 10:10 AM - June 18, 2020 10:09 AM)

corona_tweets_92.csv: 3,673,328 tweets (June 18, 2020 10:10 AM - June 19, 2020 10:10 AM)

corona_tweets_93.csv: 3,634,172 tweets (June 19, 2020 10:10 AM - June 20, 2020 10:10 AM)

corona_tweets_94.csv: 3,610,992 tweets (June 20, 2020 10:10 AM - June 21, 2020 10:10 AM)

corona_tweets_95.csv: 3,352,643 tweets (June 21, 2020 10:10 AM - June 22, 2020 10:10 AM)

corona_tweets_96.csv: 3,730,105 tweets (June 22, 2020 10:10 AM - June 23, 2020 10:09 AM)

corona_tweets_97.csv: 3,936,238 tweets (June 23, 2020 10:10 AM - June 24, 2020 10:09 AM)

corona_tweets_98.csv: 3,858,387 tweets (June 24, 2020 10:10 AM - June 25, 2020 10:09 AM)

corona_tweets_99.csv: 3,883,506 tweets (June 25, 2020 10:10 AM - June 26, 2020 10:09 AM)

corona_tweets_100.csv: 3,941,476 tweets (June 26, 2020 10:09 AM - June 27, 2020 10:10 AM)

corona_tweets_101.csv: 3,816,987 tweets (June 27, 2020 10:11 AM - June 28, 2020 10:10 AM)

corona_tweets_102.csv: 3,743,358 tweets (June 28, 2020 10:10 AM - June 29, 2020 10:10 AM)

corona_tweets_103.csv: 3,880,998 tweets (June 29, 2020 10:10 AM - June 30, 2020 10:10 AM)

corona_tweets_104.csv: 3,926,862 tweets (June 30, 2020 10:10 AM - July 01, 2020 10:10 AM)

corona_tweets_105.csv: 4,365,171 tweets (July 01, 2020 10:11 AM - July 02, 2020 12:28 PM)

corona_tweets_106.csv: 3,563,659 tweets (July 02, 2020 12:29 PM - July 03, 2020 10:10 AM)

corona_tweets_107.csv: 3,446,100 tweets (July 03, 2020 10:10 AM - July 04, 2020 07:00 AM)

corona_tweets_108.csv: 4,076,176 tweets (July 04, 2020 07:01 AM - July 05, 2020 09:16 AM)

corona_tweets_109.csv: 3,827,904 tweets (July 05, 2020 09:17 AM - July 06, 2020 10:10 AM)

corona_tweets_110.csv: 3,991,881 tweets (July 06, 2020 10:10 AM - July 07, 2020 10:10 AM)

corona_tweets_111.csv: 4,104,245 tweets (July 07, 2020 10:11 AM - July 08, 2020 10:10 AM)

corona_tweets_112.csv: 4,032,945 tweets (July 08, 2020 10:10 AM - July 09, 2020 10:10 AM)

corona_tweets_113.csv: 3,912,560 tweets (July 09, 2020 10:10 AM - July 10, 2020 10:12 AM)

corona_tweets_114.csv: 4,024,227 tweets (July 10, 2020 10:12 AM - July 11, 2020 10:20 AM)

corona_tweets_115.csv: 3,746,316 tweets (July 11, 2020 10:20 AM - July 12, 2020 10:09 AM)

corona_tweets_116.csv: 3,902,393 tweets (July 12, 2020 10:10 AM - July 13, 2020 10:09 AM)

corona_tweets_117.csv: 4,045,441 tweets (July 13, 2020 10:10 AM - July 14, 2020 10:09 AM)

Instructions: 

Each CSV file contains a list of tweet IDs. You can use these tweet IDs to download fresh data from Twitter (hydrating the tweet IDs). To make it easy for the NLP researchers to get access to the sentiment analysis of each collected tweet, the sentiment score computed by TextBlob has been appended as the second column. To hydrate the tweet IDs, you can use applications such as Hydrator (available for OS X, Windows and Linux; takes in CSV) or twarc (python library; takes in TXT) or QCRI's Tweets Downloader (java based; takes in TXT).

Getting the CSV files of this dataset ready for hydrating the tweet IDs:

import pandas as pd

dataframe=pd.read_csv("corona_tweets_10.csv", header=None)

dataframe=dataframe[0]

dataframe.to_csv("ready_corona_tweets_10.csv", index=False, header=None)

The above example code takes in the original CSV file (i.e., corona_tweets_10.csv) from this dataset and exports just the tweet ID column to a new CSV file (i.e., ready_corona_tweets_10.csv). The newly created CSV file can now be consumed by the Hydrator application for hydrating the tweet IDs. However, twarc and QCRI's Tweets Downloader consume a TXT file. To export the tweet ID column into a TXT file, just replace ".csv" with ".txt" in the to_csv function (last line) of the above example code.

If you are not comfortable with Python and pandas, you can upload these CSV files to your Google Drive and use Google Sheets to delete the second column. Once finished with the deletion, download the edited CSV files: File > Download > Comma-separated values (.csv, current sheet). These downloaded CSV files are now ready to be used with the Hydrator app for hydrating the tweets IDs.

Comments

I am getting this error

 

DatabaseError: database disk image is malformed

Submitted by Junaid khan on Mon, 03/16/2020 - 15:23

Can you tell me the name of the file you're experiencing this error with? I would recommend you to first use any kind of SQLite DB viewer to check if the downloaded file is not corrupted.

Submitted by Rabindra Lamsal on Wed, 04/29/2020 - 00:41

Hi! Could you mention what filters are you using to get the tweets? Thanks

Submitted by Victor Tavares on Tue, 03/17/2020 - 00:21

keyword: corona, language: en

A significant amount of tweets used the word 'corona' ignoring the word 'virus'. So I had to track tweets using the most generic word: just 'corona'. Therefore, a couple of tweets relating to 'corona beer' might also be present in the databases.

Submitted by Rabindra Lamsal on Tue, 03/17/2020 - 00:45

Hi ! I cannot access the LSTM Model.

Submitted by islam sadat on Wed, 03/18/2020 - 10:41

Try refreshing. Maybe the server was busy while you were trying to access the site. I just can't believe that more than 338,500 requests have been made to the model within the last 24 hours. And this amount of request is something that my model cannot handle. Sorry for the inconvenience!

Submitted by Rabindra Lamsal on Wed, 03/18/2020 - 11:09

Please fix this two datasets        

1. corona_tweets_2M.db.zip        2. corona_tweets_2M_2.zip

 

it shows this error DatabaseError: database disk image is malformed

Submitted by imran khan on Thu, 03/19/2020 - 08:19

I downloaded the very same compressed files from this page and loaded both the databases on an SQLite DB viewer. The databases work just fine. See the screenshot here: https://i.ibb.co/SyQ7ff1/Screen-Shot-2020-03-19-at-8-21-46-PM.png

I recommend you to open the databases (which are generating the image malformed error) using any DB viewer and re-save them on your machine or export to SQL or to any tabular format file system as per your preference.

Submitted by Rabindra Lamsal on Fri, 05/29/2020 - 01:31

Hi thanks for providing these datasets for the public. I have one questio, are all these files contain same structure? I wish if they had the other feilds twitter provides with tweets so we can directly do our research?

I wonder if the other files all have three columns only, unix, text and sentiment.

Submitted by ali ALdulaimi on Thu, 03/19/2020 - 11:54

Hello there! Yes, all the files have the same structure (unix, text, sentiment score). However, starting March 20 the collected tweets will also have one additional column, viz. tweet ID.

This is because, initially, the purpose of the deployed web app was not just to collect the tweets; it was more like an optimization project. However, when the corona outbreak started in China, I decided to release the collected tweets rather than just keeping them with me.

Submitted by Rabindra Lamsal on Thu, 03/19/2020 - 14:05

Hi

Rabindra Have the SQLite dbs been replaced with CSV with only time and sentiment score?Thanks  

Submitted by Bevan Ward on Sat, 03/21/2020 - 23:50

Hello Bevan. No, the first column in the CSV files is tweet ID. You'll have to automate the extraction of tweets using the list of tweet IDs. Twitter Policy; so I had to remove every other info except the tweet ID and sentiment score.

Submitted by Rabindra Lamsal on Sun, 03/22/2020 - 02:25

Thanks Rabindra for the reply - take care Bevan

Submitted by bevan ward on Sun, 03/29/2020 - 18:24

Hi, Can you please upload the tweet ids and sentiment of the old file from February and early March?

 

Thank you

Submitted by Rabia batool on Tue, 03/24/2020 - 05:17

Hello Rabia! unfortunately, I had to take down all the tweets which were collected between Feb 1, 2020, and Mar 19, 2020, because the old DB files didn't have tweet IDs collected. This was because, initially, the purpose of the deployed web app was not just to collect the tweets; it was more like an optimization project. However, when the corona outbreak started in China, I decided to release the collected tweets rather than just keeping them with me. Therefore, because of twitter data sharing policies, I am not authorized to share the old files. Sorry for the inconvenience.

Submitted by Rabindra Lamsal on Tue, 03/24/2020 - 10:42

Thank you for your response. I completely understand this. 

 

Submitted by Rabia batool on Wed, 03/25/2020 - 03:50

Hi, I'm trying to view a particular tweet using the tweet IDs that you provided with a piece of python code that you provided above after adding my credentials for (CONSUMER_KEY, CONSUMER_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET), however,  it always gives me the following error message:

 

tweepy.error.TweepError: [{'code': 144, 'message': 'No status found with that ID.'}]

 

Have you hashed those tweet ids that you uploaded? Any advice is appreciated. 

 

Best regards, 

 

Submitted by Basheer Qolomany on Mon, 03/30/2020 - 18:26

Maybe the particular tweet which you're trying to view has been either removed or hidden by the user.

Submitted by Rabindra Lamsal on Mon, 03/30/2020 - 19:56

Thanks for replying, actually I don't think those tweets have been removed or hidden by the users,  because I tried in a for loop hundreds of different tweet ids and all of them gave me the same error message. While I got some tweet id from another source they worked just fine. 

Here are the some of tweet ids that I used from file number 10 for example: 

 

1243420522592910000

1243420476824640000

1243420477235660000

1243420477646720000

1243420477894190000

1243420478238150000

1243420478535890000

1243420478829510000

1243420478951180000

1243420479706150000

1243420479844530000

1243420479982990000

1243420479924250000

1243420478837900000

1243420480205280000

1243420481744560000

1243420482075930000

1243420482201770000

1243420482222730000

1243420482084270000

1243420482814100000

1243420482935760000

1243420482629590000

 

 

Thanks, 

Submitted by Basheer Qolomany on Mon, 03/30/2020 - 20:42

I double-checked corona_tweets_10.csv, but I could not find any of these IDs in the file. However, I can see one pattern in the tweet IDs you've listed above: they all end with a number of zeros. Use sublime text or a simple text editor to open the CSV files. Looks like the application which you're using to open these files is somehow chopping off some digits at the back and replacing the chopped ones with zeros.

For example, the last ID you've listed 1243420482629590000 should have been 1243420482629591040. See that the last 4 digits are zeroes at your end. Same is the case with all other IDs you've mentioned above.

Submitted by Rabindra Lamsal on Tue, 03/31/2020 - 02:13

Yes, that's right. I read the CSV files with R, it fixed the numbers. 

Also, if you have the tweet ids for March 13 to March 19, that would be great to upload it here. 

 

Thanks; 

Submitted by Basheer Qolomany on Tue, 03/31/2020 - 17:35

The model has been collecting the corona-related tweets since Jan 27, 2020. However, the model was designed as a part of an optimization project and therefore it was made to only extract the tweets but not the tweet IDs. And because of Twitter's data sharing policy, I am not allowed to share them. Therefore, I started extracting and uploading the tweet IDs since March 20, 2020, only.

Submitted by Rabindra Lamsal on Tue, 03/31/2020 - 21:59

Thank you,

Submitted by Basheer Qolomany on Wed, 04/01/2020 - 18:30

I'm haivng the exact same issue, i.e. all IDs end with four zeros while the zeros should in fact be other numbers. I was just opening it as csv file.

 

Could you please let me know how to fix it? Thank you very much!

Submitted by Mandy Huang on Wed, 04/08/2020 - 04:02

Are you trying to write a script to hydrate the tweet IDs or something else? Please see the instruction given in the dataset description field.

Submitted by Rabindra Lamsal on Wed, 04/08/2020 - 11:26

Thank you for the reply! I've tried using the QCRI's Tweets Downloader to hydrate the tweet IDs, but same as tweepy API, the first step is to get a list of correct tweet IDs, which I don't have because of the zeros at the end of the tweet_id column in the original dataset. 

 

I saw in the previous discussion you mentioned "For example, the last ID you've listed 1243420482629590000 should have been 1243420482629591040", could you please let me know how you get the correct tweet ID that ends with 1040? Many thanks!

Submitted by Mandy Huang on Wed, 04/08/2020 - 15:21

Hello Mandy. Please do not use MS Excel to open the CSV files. Excel shows numbers with 15 digits precision. I would suggest you load the CSV file as a pandas data frame, then drop the sentiment column and export the final data frame as a CSV file (for Hydrator app) or as a text file (for QCRI 's Tweets Downloader Tool). I hope this helps.

Submitted by Rabindra Lamsal on Sat, 06/06/2020 - 00:28

Hi 

I try to download all data from twitter using user id, but the app Hydrator always stop downloading.

Is that mean the download tweets reach the rate limit?

 

thanks

Submitted by JINGLI SHI on Fri, 04/03/2020 - 00:09

Can you please elaborate? Also, I would recommend you to write to the app's author regarding the issue.

Submitted by Rabindra Lamsal on Sun, 04/05/2020 - 22:50

Congratulations for this work!

Submitted by Thiago Aparecid... on Thu, 04/09/2020 - 20:55

Thank you, Thiago.

Submitted by Rabindra Lamsal on Thu, 04/09/2020 - 22:11

Can someone share the code snippet to get the tweet text from tweet id.

Submitted by Haider Akram on Fri, 04/10/2020 - 12:11

Use Hydrator (https://github.com/DocNow/hydrator) or QCRI's Tweet Downloader tool (https://crisisnlp.qcri.org/data/tools/TweetsRetrievalTool-v2.0.zip) for downloading the tweets.

Submitted by Rabindra Lamsal on Fri, 04/10/2020 - 15:07

Can someone please help me with how to fetch the tweets? 

I am just able to see the 'Tweet IDs' and 'sentiment score'. Where and how can I download the tweets ? 

Thanks in advance. 

Submitted by Navya Shiva on Sun, 05/03/2020 - 05:48

Please refer to my reply to your comment below.

Submitted by Rabindra Lamsal on Sun, 05/03/2020 - 08:34

Hi, 

I am able to only see two columns ( 'Tweet ID' and 'sentiment score'). Could you please tell me if the tweets column is removed?

Submitted by Navya Shiva on Sun, 05/03/2020 - 06:11

Hello Navya. Because of Twitter's data sharing policy, we are not allowed to share anything except the Tweet IDs and/or User IDs. Therefore, this dataset contains only the Tweets IDs. In order to download the tweets, you'll need to hydrate these IDs using applications such as DocNow's Hydrator (available for OS X, Windows and Linux) or QCRI's Tweets Downloader (java based).

Submitted by Rabindra Lamsal on Sun, 05/03/2020 - 06:35

Hi Rabindra,

Thank you for your reply.  I have downloaded Hydrator and tried downloading the tweets. However, the CSV file isn't getting downloaded ( I chose only 85, 000 rows if Tweet IDs for sample). It is throwing an error. Could you please help me fix it? 

I am unable to post the picture here. The error is displayed as "A Javascript error occured in the main proces...", the error has so many lines with this heading. Please let me know how to go about this. 

Thank you. 

 

 

 

 

 

Submitted by Navya Shiva on Mon, 05/04/2020 - 05:20

Maybe it is an error associated with IEEE DP. Please try again to download any of the CSV files and hydrate the IDs.

Submitted by Rabindra Lamsal on Fri, 05/29/2020 - 01:33

Dear

Can you mention the library used to find the sentiment?

ThankYou

Submitted by Furqan Rustam on Mon, 06/01/2020 - 22:05

Hello Furqan. If you do not want to create a Machine learning model of your own to have the sentiment scores computed, you can make use of the TextBlob NLP toolkit.

Submitted by Rabindra Lamsal on Tue, 06/02/2020 - 00:49

Dear can you mention which toolkit you used to find the sentiment score?

Submitted by Furqan Rustam on Fri, 06/05/2020 - 06:50

It is clearly mentioned in the abstract field. It's TextBlob.

Submitted by Rabindra Lamsal on Fri, 06/05/2020 - 13:40

Hello , i am working on NLP Project. i want extract tweet only from USA . Can someone tell me how can i do it? 

 What range of  sentiment score is considered as Positive, Negative and Neutral?

Thank you

Submitted by Anil Kumar on Tue, 06/09/2020 - 10:48

Hello Anil. You'll have to add a condition to the country ('United States') or country_code ('US') Twitter object while hydrating the IDs. For this purpose, you can use the twarc python library.

for tweet in t.hydrate(open('id_file.txt')):

     country or country_code condition here:

          store the Twitter objects values (whichever you need)

Visit https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object for more info regarding Twitter objects.

About the sentiment score: [-1,0) is considered negative sentiment, 0 is considered neutral and (0,1] is considered positive. Please go through TextBlob's documentation for more information.

Submitted by Rabindra Lamsal on Tue, 06/09/2020 - 12:12

Hello , I am looking for some ideas for NLP project using this dataset. i have Tweet ID and sentiment score. what can be the NLP applications for this dataset? 

Thanks 

Submitted by Anil Kumar on Fri, 06/12/2020 - 10:40

Hello Anil. I think you should be the one who decides on what specific task you want this dataset to be used. You'll find multiple blogs, and most importantly you can go through the recently written papers at arxiv.org. From these resources, you can get hints in what specific domain people are working on using the COVID-19 tweets datasets that are currently available on the web for non-commercial research.

I am restricted by Twitter's data sharing policy, which is why I am allowed to share only the tweet IDs. You'll have to hydrate the IDs to get other information relating to a tweet. I recommend you to first learn what different Twitter Objects are available. Then you can really understand where and how you can use a particular tweet dataset.

Submitted by Rabindra Lamsal on Fri, 06/12/2020 - 12:33

When I download the .csv file of the first 3000 tweets using Hydrator, I obtain around 34 columns but I'm not able to view the sentiment score as one of the columns. Can you please help me with this?

Submitted by Pranav Saihgal on Mon, 06/15/2020 - 07:58

Twitter Objects does not have such a thing as "sentiment score". You'll have to play around with the original CSV file to append the second column of this dataset (sentiment) to the hydrated CSV.

You can refer to the example code given in the Instructions section to get a gist of how you can (i) load a CSV file as a pandas dataframe, (ii) play around with columns, and (iii) export the final dataframe as a CSV file ready for your study.

Submitted by Rabindra Lamsal on Mon, 06/15/2020 - 11:14

Has the sentiment score been obtained after preprocessing text (tweets)  or before?

Submitted by Pranav Saihgal on Mon, 06/15/2020 - 16:28

Pages

Embed this dataset on another website

Copy and paste the HTML code below to embed your dataset:

Share via email or social media

Click the buttons below:

facebooktwittermailshare
[1] Rabindra Lamsal, "Coronavirus (COVID-19) Tweets Dataset", IEEE Dataport, 2020. [Online]. Available: http://dx.doi.org/10.21227/781w-ef42. Accessed: Jul. 14, 2020.
@data{781w-ef42-20,
doi = {10.21227/781w-ef42},
url = {http://dx.doi.org/10.21227/781w-ef42},
author = {Rabindra Lamsal },
publisher = {IEEE Dataport},
title = {Coronavirus (COVID-19) Tweets Dataset},
year = {2020} }
TY - DATA
T1 - Coronavirus (COVID-19) Tweets Dataset
AU - Rabindra Lamsal
PY - 2020
PB - IEEE Dataport
UR - 10.21227/781w-ef42
ER -
Rabindra Lamsal. (2020). Coronavirus (COVID-19) Tweets Dataset. IEEE Dataport. http://dx.doi.org/10.21227/781w-ef42
Rabindra Lamsal, 2020. Coronavirus (COVID-19) Tweets Dataset. Available at: http://dx.doi.org/10.21227/781w-ef42.
Rabindra Lamsal. (2020). "Coronavirus (COVID-19) Tweets Dataset." Web.
1. Rabindra Lamsal. Coronavirus (COVID-19) Tweets Dataset [Internet]. IEEE Dataport; 2020. Available from : http://dx.doi.org/10.21227/781w-ef42
Rabindra Lamsal. "Coronavirus (COVID-19) Tweets Dataset." doi: 10.21227/781w-ef42