Coronavirus (COVID-19) Tweets Dataset

4.9
10 ratings - Please login to submit your rating.

Abstract 

This dataset includes CSV files that contain IDs and sentiment scores of the tweets related to the COVID-19 pandemic. The tweets have been collected by an on-going project deployed at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. This dataset has been wholly re-designed on March 20, 2020, to comply with the content redistribution policy set by Twitter. Below is the quick overview of this dataset.

— Dataset name: COV19Tweets Dataset

— Number of tweets : 545,218,873 tweets

— Coverage : Global

— Language : English (EN)

— Geo-tagged Version: Coronavirus (COVID-19) Geo-tagged Tweets Dataset (GeoCOV19Tweets Dataset)

— Keywords and hashtags (keywords.tsv) : "corona", "#corona", "coronavirus", "#coronavirus", "covid", "#covid", "covid19", "#covid19", "covid-19", "#covid-19", "sarscov2", "#sarscov2", "sars cov2", "sars cov 2", "covid_19", "#covid_19", "#ncov", "ncov", "#ncov2019", "ncov2019", "2019-ncov", "#2019-ncov", "pandemic", "#pandemic" "#2019ncov", "2019ncov", "quarantine", "#quarantine", "flatten the curve", "flattening the curve", "#flatteningthecurve", "#flattenthecurve", "hand sanitizer", "#handsanitizer", "#lockdown", "lockdown", "social distancing", "#socialdistancing", "work from home", "#workfromhome", "working from home", "#workingfromhome", "ppe", "n95", "#ppe", "#n95", "#covidiots", "covidiots", "herd immunity", "#herdimmunity", "pneumonia", "#pneumonia", "chinese virus", "#chinesevirus", "wuhan virus", "#wuhanvirus", "kung flu", "#kungflu", "wearamask", "#wearamask", "wear a mask", "vaccine", "vaccines", "#vaccine", "#vaccines", "corona vaccine", "corona vaccines", "#coronavaccine", "#coronavaccines", "face shield", "#faceshield", "face shields", "#faceshields", "health worker", "#health worker", "health workers", "#healthworkers", "#stayhomestaysafe", "#coronaupdate", "#frontlineheroes", "#coronawarriors", "#homeschool", "#homeschooling", "#hometasking", "#masks4all", "#wfh", "wash ur hands", "wash your hands", "#washurhands", "#washyourhands", "#stayathome", "#stayhome", "#selfisolating", "self isolating"

— Dataset updates : Everyday

— Usage policy : As per Twitter's Developer Policy

Dataset Files (the local time mentioned below is GMT+5:45)

corona_tweets_01.csv + corona_tweets_02.csv + corona_tweets_03.csv: 2,475,980 tweets (March 20, 2020 01:37 AM - March 21, 2020 09:25 AM)

corona_tweets_04.csv: 1,233,340 tweets (March 21, 2020 09:27 AM - March 22, 2020 07:46 AM)

corona_tweets_05.csv: 1,782,157 tweets (March 22, 2020 07:50 AM - March 23, 2020 09:08 AM)

corona_tweets_06.csv: 1,771,295 tweets (March 23, 2020 09:11 AM - March 24, 2020 11:35 AM)

corona_tweets_07.csv: 1,479,651 tweets (March 24, 2020 11:42 AM - March 25, 2020 11:43 AM)

corona_tweets_08.csv: 1,272,592 tweets (March 25, 2020 11:47 AM - March 26, 2020 12:46 PM)

corona_tweets_09.csv: 1,091,429 tweets (March 26, 2020 12:51 PM - March 27, 2020 11:53 AM)

corona_tweets_10.csv: 1,172,013 tweets (March 27, 2020 11:56 AM - March 28, 2020 01:59 PM)

corona_tweets_11.csv: 1,141,210 tweets (March 28, 2020 02:03 PM - March 29, 2020 04:01 PM)

> March 29, 2020 04:02 PM - March 30, 2020 02:00 PM -- Some technical fault has occurred. Preventive measures have been taken. Tweets for this session won't be available.

corona_tweets_12.csv: 793,417 tweets (March 30, 2020 02:01 PM - March 31, 2020 10:16 AM)

corona_tweets_13.csv: 1,029,294 tweets (March 31, 2020 10:20 AM - April 01, 2020 10:59 AM)

corona_tweets_14.csv: 920,076 tweets (April 01, 2020 11:02 AM - April 02, 2020 12:19 PM)

corona_tweets_15.csv: 826,271 tweets (April 02, 2020 12:21 PM - April 03, 2020 02:38 PM)

corona_tweets_16.csv: 612,512 tweets (April 03, 2020 02:40 PM - April 04, 2020 11:54 AM)

corona_tweets_17.csv: 685,560 tweets (April 04, 2020 11:56 AM - April 05, 2020 12:54 PM)

corona_tweets_18.csv: 717,301 tweets (April 05, 2020 12:56 PM - April 06, 2020 10:57 AM)

corona_tweets_19.csv: 722,921 tweets (April 06, 2020 10:58 AM - April 07, 2020 12:28 PM)

corona_tweets_20.csv: 554,012 tweets (April 07, 2020 12:29 PM - April 08, 2020 12:34 PM)

corona_tweets_21.csv: 589,679 tweets (April 08, 2020 12:37 PM - April 09, 2020 12:18 PM)

corona_tweets_22.csv: 517,718 tweets (April 09, 2020 12:20 PM - April 10, 2020 09:20 AM)

corona_tweets_23.csv: 601,199 tweets (April 10, 2020 09:22 AM - April 11, 2020 10:22 AM)

corona_tweets_24.csv: 497,655 tweets (April 11, 2020 10:24 AM - April 12, 2020 10:53 AM)

corona_tweets_25.csv: 477,182 tweets (April 12, 2020 10:57 AM - April 13, 2020 11:43 AM)

corona_tweets_26.csv: 288,277 tweets (April 13, 2020 11:46 AM - April 14, 2020 12:49 AM)

corona_tweets_27.csv: 515,739 tweets (April 14, 2020 11:09 AM - April 15, 2020 12:38 PM)

corona_tweets_28.csv: 427,088 tweets (April 15, 2020 12:40 PM - April 16, 2020 10:03 AM)

corona_tweets_29.csv: 433,368 tweets (April 16, 2020 10:04 AM - April 17, 2020 10:38 AM)

corona_tweets_30.csv: 392,847 tweets (April 17, 2020 10:40 AM - April 18, 2020 10:17 AM)

> With the addition of some more coronavirus specific keywords, the number of tweets captured day has increased significantly, therefore, the CSV files hereafter will be zipped. Lets save some bandwidth.

corona_tweets_31.csv: 2,671,818 tweets (April 18, 2020 10:19 AM - April 19, 2020 09:34 AM)

corona_tweets_32.csv: 2,393,006 tweets (April 19, 2020 09:43 AM - April 20, 2020 10:45 AM)

corona_tweets_33.csv: 2,227,579 tweets (April 20, 2020 10:56 AM - April 21, 2020 10:47 AM)

corona_tweets_34.csv: 2,211,689 tweets (April 21, 2020 10:54 AM - April 22, 2020 10:33 AM)

corona_tweets_35.csv: 2,265,189 tweets (April 22, 2020 10:45 AM - April 23, 2020 10:49 AM)

corona_tweets_36.csv: 2,201,138 tweets (April 23, 2020 11:08 AM - April 24, 2020 10:39 AM)

corona_tweets_37.csv: 2,338,713 tweets (April 24, 2020 10:51 AM - April 25, 2020 11:50 AM)

corona_tweets_38.csv: 1,981,835 tweets (April 25, 2020 12:20 PM - April 26, 2020 09:13 AM)

corona_tweets_39.csv: 2,348,827 tweets (April 26, 2020 09:16 AM - April 27, 2020 10:21 AM)

corona_tweets_40.csv: 2,212,216 tweets (April 27, 2020 10:33 AM - April 28, 2020 10:09 AM)

corona_tweets_41.csv: 2,118,853 tweets (April 28, 2020 10:20 AM - April 29, 2020 08:48 AM)

corona_tweets_42.csv: 2,390,703 tweets (April 29, 2020 09:09 AM - April 30, 2020 10:33 AM)

corona_tweets_43.csv: 2,184,439 tweets (April 30, 2020 10:53 AM - May 01, 2020 10:18 AM)

corona_tweets_44.csv: 2,223,013 tweets (May 01, 2020 10:23 AM - May 02, 2020 09:54 AM)

corona_tweets_45.csv: 2,216,553 tweets (May 02, 2020 10:18 AM - May 03, 2020 09:57 AM)

corona_tweets_46.csv: 2,266,373 tweets (May 03, 2020 10:09 AM - May 04, 2020 10:17 AM)

corona_tweets_47.csv: 2,227,489 tweets (May 04, 2020 10:32 AM - May 05, 2020 10:17 AM)

corona_tweets_48.csv: 2,218,774 tweets (May 05, 2020 10:38 AM - May 06, 2020 10:26 AM)

corona_tweets_49.csv: 2,164,251 tweets (May 06, 2020 10:35 AM - May 07, 2020 09:33 AM)

corona_tweets_50.csv: 2,203,686 tweets (May 07, 2020 09:55 AM - May 08, 2020 09:35 AM)

corona_tweets_51.csv: 2,250,019 tweets (May 08, 2020 09:39 AM - May 09, 2020 09:49 AM)

corona_tweets_52.csv: 2,273,705 tweets (May 09, 2020 09:55 AM - May 10, 2020 10:11 AM)

corona_tweets_53.csv: 2,208,264 tweets (May 10, 2020 10:23 AM - May 11, 2020 09:57 AM)

corona_tweets_54.csv: 2,216,845 tweets (May 11, 2020 10:08 AM - May 12, 2020 09:52 AM)

corona_tweets_55.csv: 2,264,472 tweets (May 12, 2020 09:59 AM - May 13, 2020 10:14 AM)

corona_tweets_56.csv: 2,339,709 tweets (May 13, 2020 10:24 AM - May 14, 2020 11:21 AM)

corona_tweets_57.csv: 2,096,878 tweets (May 14, 2020 11:38 AM - May 15, 2020 09:58 AM)

corona_tweets_58.csv: 2,214,205 tweets (May 15, 2020 10:13 AM - May 16, 2020 09:43 AM)

> The server and the databases have been optimized; therefore, there is a significant rise in the number of tweets captured per day.

corona_tweets_59.csv: 3,389,090 tweets (May 16, 2020 09:58 AM - May 17, 2020 10:34 AM)

corona_tweets_60.csv: 3,530,933 tweets (May 17, 2020 10:36 AM - May 18, 2020 10:07 AM)

corona_tweets_61.csv: 3,899,631 tweets (May 18, 2020 10:08 AM - May 19, 2020 10:07 AM)

corona_tweets_62.csv: 3,767,009 tweets (May 19, 2020 10:08 AM - May 20, 2020 10:06 AM)

corona_tweets_63.csv: 3,790,455 tweets (May 20, 2020 10:06 AM - May 21, 2020 10:15 AM)

corona_tweets_64.csv: 3,582,020 tweets (May 21, 2020 10:16 AM - May 22, 2020 10:13 AM)

corona_tweets_65.csv: 3,461,470 tweets (May 22, 2020 10:14 AM - May 23, 2020 10:08 AM)

corona_tweets_66.csv: 3,477,564 tweets (May 23, 2020 10:08 AM - May 24, 2020 10:02 AM)

corona_tweets_67.csv: 3,656,446 tweets (May 24, 2020 10:02 AM - May 25, 2020 10:10 AM)

corona_tweets_68.csv: 3,474,952 tweets (May 25, 2020 10:11 AM - May 26, 2020 10:22 AM)

corona_tweets_69.csv: 3,422,960 tweets (May 26, 2020 10:22 AM - May 27, 2020 10:16 AM)

corona_tweets_70.csv: 3,480,999 tweets (May 27, 2020 10:17 AM - May 28, 2020 10:35 AM)

corona_tweets_71.csv: 3,446,008 tweets (May 28, 2020 10:36 AM - May 29, 2020 10:07 AM)

corona_tweets_72.csv: 3,492,841 tweets (May 29, 2020 10:07 AM - May 30, 2020 10:14 AM)

corona_tweets_73.csv: 3,098,817 tweets (May 30, 2020 10:15 AM - May 31, 2020 10:13 AM)

corona_tweets_74.csv: 3,234,848 tweets (May 31, 2020 10:13 AM - June 01, 2020 10:14 AM)

corona_tweets_75.csv: 3,206,132 tweets (June 01, 2020 10:15 AM - June 02, 2020 10:07 AM)

corona_tweets_76.csv: 3,206,417 tweets (June 02, 2020 10:08 AM - June 03, 2020 10:26 AM)

corona_tweets_77.csv: 3,256,225 tweets (June 03, 2020 10:27 AM - June 04, 2020 10:23 AM)

corona_tweets_78.csv: 2,205,123 tweets (June 04, 2020 10:26 AM - June 05, 2020 10:03 AM) (tweet IDs were extracted from the backup server for this session)

corona_tweets_79.csv: 3,381,184 tweets (June 05, 2020 10:11 AM - June 06, 2020 10:16 AM)

corona_tweets_80.csv: 3,194,500 tweets (June 06, 2020 10:17 AM - June 07, 2020 10:24 AM)

corona_tweets_81.csv: 2,768,780 tweets (June 07, 2020 10:25 AM - June 08, 2020 10:13 AM)

corona_tweets_82.csv: 3,032,227 tweets (June 08, 2020 10:13 AM - June 09, 2020 10:12 AM)

corona_tweets_83.csv: 2,984,970 tweets (June 09, 2020 10:12 AM - June 10, 2020 10:13 AM)

corona_tweets_84.csv: 3,068,002 tweets (June 10, 2020 10:14 AM - June 11, 2020 10:11 AM)

corona_tweets_85.csv: 3,261,215 tweets (June 11, 2020 10:12 AM - June 12, 2020 10:10 AM)

corona_tweets_86.csv: 3,378,901 tweets (June 12, 2020 10:11 AM - June 13, 2020 10:10 AM)

corona_tweets_87.csv: 3,011,103 tweets (June 13, 2020 10:11 AM - June 14, 2020 10:08 AM)

corona_tweets_88.csv: 3,154,328 tweets (June 14, 2020 10:09 AM - June 15, 2020 10:10 AM)

corona_tweets_89.csv: 3,837,552 tweets (June 15, 2020 10:10 AM - June 16, 2020 10:10 AM)

corona_tweets_90.csv: 3,889,262 tweets (June 16, 2020 10:11 AM - June 17, 2020 10:10 AM)

corona_tweets_91.csv: 3,688,348 tweets (June 17, 2020 10:10 AM - June 18, 2020 10:09 AM)

corona_tweets_92.csv: 3,673,328 tweets (June 18, 2020 10:10 AM - June 19, 2020 10:10 AM)

corona_tweets_93.csv: 3,634,172 tweets (June 19, 2020 10:10 AM - June 20, 2020 10:10 AM)

corona_tweets_94.csv: 3,610,992 tweets (June 20, 2020 10:10 AM - June 21, 2020 10:10 AM)

corona_tweets_95.csv: 3,352,643 tweets (June 21, 2020 10:10 AM - June 22, 2020 10:10 AM)

corona_tweets_96.csv: 3,730,105 tweets (June 22, 2020 10:10 AM - June 23, 2020 10:09 AM)

corona_tweets_97.csv: 3,936,238 tweets (June 23, 2020 10:10 AM - June 24, 2020 10:09 AM)

corona_tweets_98.csv: 3,858,387 tweets (June 24, 2020 10:10 AM - June 25, 2020 10:09 AM)

corona_tweets_99.csv: 3,883,506 tweets (June 25, 2020 10:10 AM - June 26, 2020 10:09 AM)

corona_tweets_100.csv: 3,941,476 tweets (June 26, 2020 10:09 AM - June 27, 2020 10:10 AM)

corona_tweets_101.csv: 3,816,987 tweets (June 27, 2020 10:11 AM - June 28, 2020 10:10 AM)

corona_tweets_102.csv: 3,743,358 tweets (June 28, 2020 10:10 AM - June 29, 2020 10:10 AM)

corona_tweets_103.csv: 3,880,998 tweets (June 29, 2020 10:10 AM - June 30, 2020 10:10 AM)

corona_tweets_104.csv: 3,926,862 tweets (June 30, 2020 10:10 AM - July 01, 2020 10:10 AM)

corona_tweets_105.csv: 4,365,171 tweets (July 01, 2020 10:11 AM - July 02, 2020 12:28 PM)

corona_tweets_106.csv: 3,563,659 tweets (July 02, 2020 12:29 PM - July 03, 2020 10:10 AM)

corona_tweets_107.csv: 3,446,100 tweets (July 03, 2020 10:10 AM - July 04, 2020 07:00 AM)

corona_tweets_108.csv: 4,076,176 tweets (July 04, 2020 07:01 AM - July 05, 2020 09:16 AM)

corona_tweets_109.csv: 3,827,904 tweets (July 05, 2020 09:17 AM - July 06, 2020 10:10 AM)

corona_tweets_110.csv: 3,991,881 tweets (July 06, 2020 10:10 AM - July 07, 2020 10:10 AM)

corona_tweets_111.csv: 4,104,245 tweets (July 07, 2020 10:11 AM - July 08, 2020 10:10 AM)

corona_tweets_112.csv: 4,032,945 tweets (July 08, 2020 10:10 AM - July 09, 2020 10:10 AM)

corona_tweets_113.csv: 3,912,560 tweets (July 09, 2020 10:10 AM - July 10, 2020 10:12 AM)

corona_tweets_114.csv: 4,024,227 tweets (July 10, 2020 10:12 AM - July 11, 2020 10:20 AM)

corona_tweets_115.csv: 3,746,316 tweets (July 11, 2020 10:20 AM - July 12, 2020 10:09 AM)

corona_tweets_116.csv: 3,902,393 tweets (July 12, 2020 10:10 AM - July 13, 2020 10:09 AM)

corona_tweets_117.csv: 4,045,441 tweets (July 13, 2020 10:10 AM - July 14, 2020 10:09 AM)

corona_tweets_118.csv: 4,130,726 tweets (July 14, 2020 10:10 AM - July 15, 2020 10:25 AM)

corona_tweets_119.csv: 4,106,648 tweets (July 15, 2020 10:26 AM - July 16, 2020 10:10 AM)

corona_tweets_120.csv: 4,083,573 tweets (July 16, 2020 10:11 AM - July 17, 2020 10:10 AM)

corona_tweets_121.csv: 4,014,323 tweets (July 17, 2020 10:10 AM - July 18, 2020 10:25 AM)

corona_tweets_122.csv: 3,639,620 tweets (July 18, 2020 10:25 AM - July 19, 2020 10:30 AM)

corona_tweets_123.csv: 3,600,404 tweets (July 19, 2020 10:30 AM - July 20, 2020 10:10 AM)

corona_tweets_124.csv: 3,777,908 tweets (July 20, 2020 10:11 AM - July 21, 2020 10:10 AM)

corona_tweets_125.csv: 3,771,150 tweets (July 21, 2020 10:11 AM - July 22, 2020 10:10 AM)

corona_tweets_126.csv: 3,691,852 tweets (July 22, 2020 10:10 AM - July 23, 2020 10:10 AM)

corona_tweets_127.csv: 3,661,885 tweets (July 23, 2020 10:10 AM - July 24, 2020 10:10 AM)

corona_tweets_128.csv: 3,621,819 tweets (July 24, 2020 10:10 AM - July 25, 2020 10:20 AM)

corona_tweets_129.csv: 3,512,553 tweets (July 25, 2020 10:20 AM - July 26, 2020 10:10 AM)

corona_tweets_130.csv: 3,399,349 tweets (July 26, 2020 10:11 AM - July 27, 2020 10:10 AM)

corona_tweets_131.csv: 3,889,978 tweets (July 27, 2020 10:10 AM - July 28, 2020 10:10 AM)

corona_tweets_132.csv: 4,167,168 tweets (July 28, 2020 10:10 AM - July 29, 2020 10:10 AM)

corona_tweets_133.csv: 4,007,131 tweets (July 29, 2020 10:10 AM - July 30, 2020 10:10 AM)

corona_tweets_134.csv: 3,968,762 tweets (July 30, 2020 10:10 AM - July 31, 2020 10:10 AM)

corona_tweets_135.csv: 3,867,434 tweets (July 31, 2020 10:10 AM - August 01, 2020 10:12 AM)

corona_tweets_136.csv: 3,533,863 tweets (August 01, 2020 10:12 AM - August 02, 2020 10:10 AM)

corona_tweets_137.csv: 3,748,433 tweets (August 02, 2020 10:10 AM - August 03, 2020 10:10 AM)

corona_tweets_138.csv: 3,810,246 tweets (August 03, 2020 10:10 AM - August 04, 2020 10:12 AM)

corona_tweets_139.csv: 3,726,039 tweets (August 04, 2020 10:12 AM - August 05, 2020 10:10 AM)

corona_tweets_140.csv: 3,770,597 tweets (August 05, 2020 10:10 AM - August 06, 2020 10:10 AM)

corona_tweets_141.csv: 3,839,194 tweets (August 06, 2020 10:10 AM - August 07, 2020 10:10 AM)

corona_tweets_142.csv: 3,702,517 tweets (August 07, 2020 10:11 AM - August 08, 2020 10:10 AM)

corona_tweets_143.csv: 3,482,091 tweets (August 08, 2020 10:11 AM - August 09, 2020 10:10 AM)

corona_tweets_144.csv: 3,822,854 tweets (August 09, 2020 10:10 AM - August 10, 2020 10:10 AM)

corona_tweets_145.csv: 3,911,443 tweets (August 10, 2020 10:10 AM - August 11, 2020 10:10 AM)

corona_tweets_146.csv: 3,838,286 tweets (August 11, 2020 10:10 AM - August 12, 2020 10:10 AM)

corona_tweets_147.csv: 3,624,028 tweets (August 12, 2020 10:10 AM - August 13, 2020 10:10 AM)

corona_tweets_148.csv: 3,749,980 tweets (August 13, 2020 10:10 AM - August 14, 2020 10:10 AM)

corona_tweets_149.csv: 3,683,305 tweets (August 14, 2020 10:10 AM - August 15, 2020 10:10 AM)

corona_tweets_150.csv: 3,187,087 tweets (August 15, 2020 10:10 AM - August 16, 2020 10:10 AM)

corona_tweets_151.csv: 3,181,939 tweets (August 16, 2020 10:10 AM - August 17, 2020 10:10 AM)

corona_tweets_152.csv: 3,680,958 tweets (August 17, 2020 10:10 AM - August 18, 2020 10:10 AM)

corona_tweets_153.csv: 3,610,316 tweets (August 18, 2020 10:10 AM - August 19, 2020 10:10 AM)

corona_tweets_154.csv: 3,534,349 tweets (August 19, 2020 10:10 AM - August 20, 2020 10:10 AM)

corona_tweets_155.csv: 3,609,804 tweets (August 20, 2020 10:10 AM - August 21, 2020 10:10 AM)

corona_tweets_156.csv: 3,962,927 tweets (August 21, 2020 10:10 AM - August 22, 2020 10:10 AM)

corona_tweets_157.csv: 3,583,818 tweets (August 22, 2020 10:10 AM - August 23, 2020 10:10 AM)

corona_tweets_158.csv: 4,045,201 tweets (August 23, 2020 10:10 AM - August 24, 2020 10:10 AM)

corona_tweets_159.csv: 3,982,835 tweets (August 24, 2020 10:10 AM - August 25, 2020 10:20 AM)

corona_tweets_160.csv: 3,896,212 tweets (August 25, 2020 10:20 AM - August 26, 2020 10:10 AM)

corona_tweets_161.csv: 3,965,851 tweets (August 26, 2020 10:10 AM - August 27, 2020 10:10 AM)

corona_tweets_162.csv: 3,913,091 tweets (August 27, 2020 10:10 AM - August 28, 2020 10:10 AM)

corona_tweets_163.csv: 3,850,248 tweets (August 28, 2020 10:10 AM - August 29, 2020 10:10 AM)

corona_tweets_164.csv: 3,282,065 tweets (August 29, 2020 10:10 AM - August 30, 2020 10:10 AM)

corona_tweets_165.csv: 3,494,658 tweets (August 30, 2020 10:11 AM - August 31, 2020 10:10 AM)

corona_tweets_166.csv: 3,725,303 tweets (August 31, 2020 10:10 AM - September 01, 2020 10:10 AM)

corona_tweets_167.csv: 3,665,464 tweets (September 01, 2020 10:10 AM - September 02, 2020 10:10 AM)

corona_tweets_168.csv: 3,742,416 tweets (September 02, 2020 10:10 AM - September 03, 2020 10:10 AM)

corona_tweets_169.csv: 3,833,791 tweets (September 03, 2020 10:10 AM - September 04, 2020 10:10 AM)

corona_tweets_170.csv: 3,189,110 tweets (September 04, 2020 10:10 AM - September 05, 2020 10:15 AM)

corona_tweets_171.csv: 2,736,116 tweets (September 05, 2020 10:15 AM - September 06, 2020 10:10 AM)

corona_tweets_172.csv: 2,742,674 tweets (September 06, 2020 10:10 AM - September 07, 2020 10:10 AM)

corona_tweets_173.csv: 3,428,867 tweets (September 07, 2020 10:10 AM - September 08, 2020 10:10 AM)

corona_tweets_174.csv: 3,596,199 tweets (September 08, 2020 10:10 AM - September 09, 2020 10:10 AM)

corona_tweets_175.csv: 3,983,190 tweets (September 09, 2020 10:11 AM - September 10, 2020 10:10 AM)

corona_tweets_176.csv: 4,032,447 tweets (September 10, 2020 10:10 AM - September 11, 2020 10:10 AM)

corona_tweets_177.csv: 3,499,620 tweets (September 11, 2020 10:10 AM - September 12, 2020 10:10 AM)

corona_tweets_178.csv: 3,165,691 tweets (September 12, 2020 10:10 AM - September 13, 2020 10:10 AM)

corona_tweets_179.csv: 3,172,727 tweets (September 13, 2020 10:10 AM - September 14, 2020 10:10 AM)

corona_tweets_180.csv: 3,590,356 tweets (September 14, 2020 10:10 AM - September 15, 2020 10:10 AM)

corona_tweets_181.csv: 3,638,935 tweets (September 15, 2020 10:10 AM - September 16, 2020 10:10 AM)

corona_tweets_182.csv: 3,839,131 tweets (September 16, 2020 10:10 AM - September 17, 2020 10:10 AM)

corona_tweets_183.csv: 3,661,202 tweets (September 17, 2020 10:10 AM - September 18, 2020 10:10 AM)

corona_tweets_184.csv: 3,328,710 tweets (September 18, 2020 10:10 AM - September 19, 2020 10:10 AM)

Why are only tweet IDs being shared?

Twitter's content redistribution policy restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers always to pull fresh data. It is because a user might delete a tweet or make their profile protected. If the same tweet has already been pulled and shared on a public domain, it might make the user/community vulnerable to many inferences coming out of the shared data which currently does not exist or is private.

Do you have tweets collected before March 20, 2020?

Unfortunately, I had to unpublish more than 20 million tweets collected between Jan 27, 2020, and March 20, 2020, because the collection did not have tweet IDs obtained. "Why?" you might ask. Initially, the primary objective of the deployed model was not just to collect the tweets; it was more like an optimization project aiming to study how much information, received in a near-real-time scenario, can be processed with minimal computing resources at hand. However, when the COVID-19 outbreak started becoming a global emergency, I decided to release the collected tweets rather than just keeping them with me. But, the collection did not have tweet IDs. Which is why a fresh collection was started after March 20, 2020.

Instructions: 

Each CSV file contains a list of tweet IDs. You can use these tweet IDs to download fresh data from Twitter (hydrating the tweet IDs). To make it easy for the NLP researchers to get access to the sentiment analysis of each collected tweet, the sentiment score computed by TextBlob has been appended as the second column. To hydrate the tweet IDs, you can use applications such as Hydrator (available for OS X, Windows and Linux) or twarc (python library) or QCRI's Tweets Downloader (java based).

Getting the CSV files of this dataset ready for hydrating the tweet IDs:

import pandas as pd

dataframe=pd.read_csv("corona_tweets_10.csv", header=None)

dataframe=dataframe[0]

dataframe.to_csv("ready_corona_tweets_10.csv", index=False, header=None)

The above example code takes in the original CSV file (i.e., corona_tweets_10.csv) from this dataset and exports just the tweet ID column to a new CSV file (i.e., ready_corona_tweets_10.csv). The newly created CSV file can now be consumed by the Hydrator application for hydrating the tweet IDs. To export the tweet ID column into a TXT file, just replace ".csv" with ".txt" in the to_csv function (last line) of the above example code.

If you are not comfortable with Python and pandas, you can upload these CSV files to your Google Drive and use Google Sheets to delete the second column. Once finished with the deletion, download the edited CSV files: File > Download > Comma-separated values (.csv, current sheet). These downloaded CSV files are now ready to be used with the Hydrator app for hydrating the tweets IDs.

Comments

[updated on September 13, 2020] The tweet text should be preprocessed before computing sentiment scores for better analysis. In this case, the raw text obtained from the API was (after removal of "#", "@" and URLs) passed to the TextBlob's sentiment module. The streaming API sends around 30-40 tweets per second, therefore, advance level preprocessing such as "spelling correction" and "dealing with Twitter abbreviations", all in real-time, would have been a bottleneck.

Submitted by Rabindra Lamsal on Sun, 09/13/2020 - 01:29

hi i need the topic of these datasets ,How i get it ?

Submitted by Abdullah Matin on Tue, 06/16/2020 - 03:06

Can you please clarify what you want to mean by "the topic of these datasets"?

Submitted by Rabindra Lamsal on Tue, 06/16/2020 - 04:30

Hello i have question,if these tweets IDs include the tweets reply IDS as well ? or these tweets IDS are only the main tweets?

Thanks in advance.

Submitted by Anil Kumar on Thu, 06/18/2020 - 09:14

Yes, the dataset does include the reply tweets as well. Anything that gets an ID for itself on Twitter is returned by the API that matches the criteria one is applying.

Submitted by Rabindra Lamsal on Thu, 06/18/2020 - 14:16

Hi,

When I use the Hydrator, I noticed a lot of the tweets were getting deleted, nearly 25%. Any idea why this could be happening?

Submitted by Arjun Acharya on Fri, 06/19/2020 - 19:42

Hello Arjun.

Anything (main tweets, retweets with comment, replies) that gets a unique identifier on Twitter can be pulled. Maybe that particular period saw a lot of users deleting the tweet(s) or making their profile private. However, deletion of tweets at 25% is definitely significant. Last week I was filtering out geo-tagged tweets from the normal ones, and noticed that around 15-17% of the tweets were not available. This is usual, and is the primary reason why Twitter wants the researchers to always pull fresh data.

Submitted by Rabindra Lamsal on Sat, 06/20/2020 - 01:02

Hello,

i want to get the tweets of specific country.

Is it possible to get tweets by country with this dataset? or i have to use geo-tagged dataset to get the tweets by country?

Best regards

Submitted by Anil Kumar on Mon, 06/29/2020 - 12:38

Please refer to my comment below this thread. While I was replying to your comment, I did not notice that I was starting a new thread.

Submitted by Rabindra Lamsal on Tue, 06/30/2020 - 00:33

Hello Anil.

Yes, it is possible to get country-specific tweets with this dataset. The geo-tagged dataset contains tweets that had "point" location-enabled. To get the country-specific tweets from this dataset you'll have to consider using condition within "country_code" or "country" Twitter object. This way you'll only be storing the tweets information originating from the country of your interest.

Such as: "country_code": "US" / "country": "United States"

Submitted by Rabindra Lamsal on Tue, 06/30/2020 - 00:22

Hello,

I have tried with as well but i am only able to get tweets by country of few countries. majority tweets are not giving place information.

Best Regards

Anil Kumar

Submitted by Anil Kumar on Tue, 06/30/2020 - 12:06

Hello,

I have tried with as well but i am only able to get tweets by country of few countries. majority tweets are not giving place information.

Best Regards

Anil Kumar

Submitted by Anil Kumar on Tue, 06/30/2020 - 12:06

Hello,

I have tried with as well but i am only able to get tweets by country of few countries. majority tweets are not giving place information.

Best Regards

Anil Kumar

Submitted by Anil Kumar on Tue, 06/30/2020 - 12:06

[updated on Aug 7, 2020] Dealing with location information on Twitter data can itself be a research topic. However, if I'm looking for a headstart, I would try something like this. I would play around the location-specific Twitter Objects at three different levels. First, I would check if the tweet is geo-tagged (if it contains an exact location). Secondly, if the tweet is not geo-tagged, chances are that it might have a region or a country boundary box defined. Third, if none of the criteria satisfy, I would simply try to extract location information from the user's profile.

Here's an example of using twarc as a python library for this purpose.

from twarc import Twarc

consumer_key=""

consumer_secret=""

access_token=""

access_token_secret=""

t = Twarc(consumer_key, consumer_secret, access_token, access_token_secret)

for tweet in t.hydrate(open('tweet_ids.txt')):

    if tweet["coordinates"]:

        loc = tweet[‘‘place"]["country"] #place based on the "point" location

        '''check the value in "loc" if it is from a country of your interest'''

        '''however do check if tweet["place"] is of NoneType. In that condition get the long, lat from tweet["coordinates"]["coordinates"] and convert it to human readable format.

    elif tweet["place"]:

        loc = tweet[‘‘place"]["country"] #bounding box region

        '''check the value in "loc" if it is from a country of your interest'''

    else:

        loc_profile = tweet["user"]["location"] #location from profile

        '''check the value in "loc_profile" if it is from a country of your interest'''

Submitted by Rabindra Lamsal on Thu, 08/06/2020 - 22:58

when I upload the CSV file into Hydrator I get this error message.

Tweet ID File Error

invalid tweet id on line 1 in /Users/ra**/Desktop/march24_march25 geo.csv

 

Can you help me please.

Submitted by Gayathri Parame... on Thu, 07/09/2020 - 04:31

The CSV files in this dataset have two columns: the first column is tweet ID and the second column is sentiment score. The Hydrator app takes in a CSV file as input with only the tweet ID column. So you'll have to remove the second column from the CSV files before feeding the files into the Hydrator app. Dropping a column in Pandas is quite straightforward. See the Instructions section of this dataset page. Note that, MS Excel should not be used for this purpose, as it handles numbers only up to 15 digits precision; however, tweet IDs are of 19 digits.

Submitted by Rabindra Lamsal on Thu, 07/09/2020 - 13:04

corona_tweets_04.csv: 1,233,340 tweets (March 21, 2020 09:27 AM - March 22, 2020 07:46 AM) of these number of tweet ids the hydrator read only close to 10 lakh ids and the resulting csv file contained tweets of only 1 lakh ids of which only 430 tweets belonged to India. What is the reason?

Submitted by Moonis Shakeel on Mon, 07/20/2020 - 00:35

I do not understand why only 10 out of 12 lacs of the IDs were loaded by the hydrator app. And I am totally unaware why the resulting CSV file has just 1 lac tweets. Please write to the developer about this. However, I would suggest you use the twarc python library instead. I've used the library in the past, and I was easily hydrating around 3-4 million IDs per task.

You received 430 tweets belonging to India from a single day file. This is a good number if you'll ask me. Tweets are rarely geo-tagged. You might find this surprising, but the number of tweets in this dataset as of today is around 322 million, out of which 145.9k are geo-tagged.

Submitted by Rabindra Lamsal on Mon, 07/20/2020 - 11:15

Please help impport csv file with more than 1048576 rows.

Submitted by Moonis Shakeel on Mon, 07/20/2020 - 07:19

If you're experiencing problems with the hydrator app, use twarc. 

Submitted by Rabindra Lamsal on Mon, 07/20/2020 - 10:00

Hi Rabindra, is it possible for oyu to share the 54 keywords please that you are using to search for the tweets as the section above just says there are 54 keywords but doesn'ts state what they are and this would be helpful for my research. Best wishes, Farhaan - UCL in London 

Submitted by Farhaan Ali on Sun, 08/02/2020 - 14:24

Hello Farhaan. Please refer to my comment below.

Submitted by Rabindra Lamsal on Mon, 08/03/2020 - 21:59

Hello Farhaan. Looks like the site has got a new UI and I don't find a reply option to the thread you've started. Anyway, here's an overview of the filtering keywords used for the collection of tweets. FYI, a manuscript is on its way; I'll update this page once it finishes.

(i) corona, coronavirus, covid, ncov, ncov2019, pandemic, quarantaine, hand sanitizer, lockdown, social distancing, ppe, n95, covidiots, herd immunity, pneumonia, chinese virus, wuhan virus, kung flu, (ii) variants of covid(19,-19,_19), sars cov 2, 2019 ncov, flatten(ing) the curve, work(ing) from home, and (iii) the respective hashtags of all the keywords mentioned. 

Submitted by Rabindra Lamsal on Mon, 08/03/2020 - 06:27

how can I download all the files without automatically? thanks you very much

Submitted by An Vu on Tue, 08/11/2020 - 08:43

how can I download all the files automatically? thanks you very much

Submitted by An Vu on Tue, 08/11/2020 - 08:43

Please write to IEEE DataPort.

Submitted by Rabindra Lamsal on Wed, 08/12/2020 - 00:11

How to get country wise tweet-count of each day?

Submitted by Rounak Agarwal on Sun, 08/16/2020 - 12:36

You must consider dealing with the ['place'] Twitter object to come up with the country-wise distribution of the tweets. Or you can use the COVID-19 Geo-tagged dataset (https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tw...); this secondary dataset is small and hydrating the tweets for your purpose might be quite easy. But if you are looking for large numbers in the distribution chart, you'll have to hydrate the tweets IDs of this dataset and check for the ['place'] object and only store the tweets that are geo-tagged or have twitter place defined, for finally coming up with a country-wise distribution.

Submitted by Rabindra Lamsal on Mon, 08/17/2020 - 00:57

Thank you so much Rabindra for your response. Could you please guide me how I can extract country-code from the user's profile who have created the tweet having its place object null?

Submitted by Rounak Agarwal on Mon, 08/17/2020 - 05:24

The location object in user dict lets you access the address given on a user's profile. However, that field is not really validated for authentic geo-address. Even addresses such as "Milky Way Galaxy", "Earth", "Land", "My Dream", ..., etc are accepted entries. Therefore, I'm not sure if you'd really want to make use of the profile address for extracting country information. If I had to anyhow come up with some way around to get this stuff done, then I would probably geo-code the address given in the profile field, and finally, reverse-geocode the coordinates to extract the country information.

Submitted by Rabindra Lamsal on Mon, 08/17/2020 - 22:41

Hiii, I am doing research on covid-19 infodemic using your database. I wanted to compare my results with other papers. Can you provide me with some research papers which have used your dataset?
Thankyou

Submitted by Kritika Saini on Mon, 08/24/2020 - 08:16

Hello Kritika. I have not personally kept track of papers, scholarly articles, blogs, etc. that might be referencing this dataset. However, a couple of searches on Google and Google Scholar might help you land on those articles. Thank you.

Submitted by Rabindra Lamsal on Mon, 08/24/2020 - 13:32

Pages

Dataset Files

ACCESS ON AWS