Tweets Originating from India During COVID-19 Lockdowns

Name: Tweets Originating from India During COVID-19 Lockdowns
Creator: Rabindra Lamsal
License: https://creativecommons.org/licenses/by/4.0/

Citation Author(s):: Rabindra Lamsal (School of Computer and Systems Sciences, JNU)
Submitted by:: Rabindra Lamsal
Last updated:: Mon, 05/20/2024 - 01:38
DOI:: 10.21227/k8gw-xz18
Data Format:: .db (SQLite Database)
Links:: Author's Homepage

5128 views

Categories:

Keywords:

Corona Tweets Dataset

COVID-19 Tweets Dataset

Corona Tweets

COVID-19 Tweets

Corona Twitter Sentiment

COVID-19 Twitter Sentiment

SARS-CoV-2 Tweets Dataset

SARS-CoV-2 Twitter Sentiment

Coronavirus English Tweets Dataset

COVID-19 English Tweets Dataset

CITE

Abstract

This India-specific COVID-19 tweets dataset has been curated using the large-scale Coronavirus (COVID-19) Tweets Dataset. This dataset contains tweets originating from India during the first week of each of the four phases of nationwide lockdowns initiated by the Government of India. For more information on filtering keywords, please visit the primary dataset page.

Announcements:

We have released BillionCOV — a billion-scale COVID-19 tweets dataset for efficient hydration. Hydration takes time due to limits placed by Twitter on its tweet lookup endpoint. We re-hydrated the tweets present in COV19Tweets and found that more than 500 million tweet identifiers point to either deleted or protected tweets. If we avoid hydrating those tweet identifiers alone, it saves almost two months in a single hydration task. BillionCOV will receive quarterly updates, while COV19Tweets will continue to receive updates every day. Learn more about BillionCOV on its page: https://dx.doi.org/10.21227/871g-yp65
We also release a million-scale COVID-19-specific geotagged tweets dataset — MegaGeoCOV (on GitHub). The dataset is introduced in the paper "Twitter conversations predict the daily confirmed COVID-19 cases".

Related publications:

Rabindra Lamsal. (2021). Design and analysis of a large-scale COVID-19 tweets dataset. Applied Intelligence, 51(5), 2790-2804.
Rabindra Lamsal, Aaron Harwood, Maria Rodriguez Read. (2022). Socially Enhanced Situation Awareness from Microblogs using Artificial Intelligence: A Survey. ACM Computing Surveys, 55(4), 1-38. (arXiv)
Rabindra Lamsal, Aaron Harwood, Maria Rodriguez Read. (2022). Twitter conversations predict the daily confirmed COVID-19 cases. Applied Soft Computing, 129, 109603. (arXiv)
Rabindra Lamsal, Aaron Harwood, Maria Rodriguez Read. (2022). Addressing the location A/B problem on Twitter: the next generation location inference research. In 2022 ACM SIGSPATIAL LocalRec (pp. 1-4).
Rabindra Lamsal, Aaron Harwood, Maria Rodriguez Read. (2022). Where did you tweet from? Inferring the origin locations of tweets based on contextual information. In 2022 IEEE International Conference on Big Data (pp. 3935-3944). (arXiv)
Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera. (2023). BillionCOV: An Enriched Billion-scale Collection of COVID-19 tweets for Efficient Hydration. Data in Brief, 48, 109229. (arXiv)
Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera. (2023). A Twitter narrative of the COVID-19 pandemic in Australia. In 20th International ISCRAM Conference (pp. 353-370). (arXiv)
Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera. (2024). CrisisTransformers: Pre-trained language models and sentence encoders for crisis-related social media texts. Knowledge-Based Systems, 296, 111916. (arXiv)
Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera. (2024). Semantically Enriched Cross-Lingual Sentence Embeddings for Crisis-related Social Media Texts. In 21st International ISCRAM Conference (in press). (arXiv)

— Dataset usage terms : By using this dataset, you agree to (i) use the content of this dataset and the data generated from the content of this dataset for non-commercial research only, (ii) remain in compliance with Twitter's Developer Policy and (iii) cite the following paper:

Lamsal, R. (2020). Design and analysis of a large-scale COVID-19 tweets dataset. Applied Intelligence, 1-15.

BibTeX:

@article{lamsal2020design, title={Design and analysis of a large-scale COVID-19 tweets dataset}, author={Lamsal, Rabindra}, journal={Applied Intelligence}, pages={1--15}, year={2020}, publisher={Springer} }

What's inside the dataset?

The files in the dataset contain IDs of the tweets present in the Coronavirus (COVID-19) Tweets Dataset. Note: Below, (all files) means that all the files mentioned and in-between have been considered to develop the ID file, while (only even-numbered files) suggests that only the even-numbered files have been considered.

Lockdown period tweets: (all files)

Lockdown1.zip: March 25, 2020 - April 02, 2020; corona_tweets_08.csv to corona_tweets_14.csv

Lockdown2.zip: April 14, 2020 - April 21, 2020; corona_tweets_27.csv to corona_tweets_33.csv

Lockdown3.zip: May 01, 2020 - May 07, 2020; corona_tweets_44.csv to corona_tweets_49.csv

Lockdown4.zip: May 18, 2020 - May 23, 2020; corona_tweets_61.csv to corona_tweets_66.csv

Extras: (all files)

extras_june1_june7.zip: corona_tweets_75.csv to corona_tweets_80.csv

Extras: (only even-numbered files)

extras_june24_july1.zip: corona_tweets_96.csv to corona_tweets_104.csv

extras_july2_july15.zip: corona_tweets_106.csv to corona_tweets_118.csv

extras_july16_august4.zip: corona_tweets_120.csv to corona_tweets_138.csv

extras_august5_august18.zip: corona_tweets_140.csv to corona_tweets_152.csv

extras_august19_september1.zip: corona_tweets_154.csv to corona_tweets_166.csv

extras_september2_september15.zip: corona_tweets_168.csv to corona_tweets_180.csv

Instructions:

The zipped files contain .db (SQLite database) files. Each .db file has a table 'geo'. To hydrate the IDs you can import the .db file as a pandas dataframe and then export it to .CSV or .TXT for hydration. For more details on hydrating the IDs, please visit the primary dataset page.

conn = sqlite3.connect('/path/to/the/db/file')

c = conn.cursor()

data = pd.read_sql("SELECT tweet_id FROM geo", conn)

Please also develop a Pakistan-specific COVID-19 tweets dataset.

Saghir Ahmed Sat, 10/24/2020 - 14:35 Permalink

Hello Saghir. You can hydrate the IDs present in the primary dataset to create country-specific datasets. If you closely follow the instructions that I'd emailed you earlier, you can easily extract Pakistan specific tweets based on geo-tagged info and/or Twitter place info.

Rabindra Lamsal Sun, 10/25/2020 - 04:50 Permalink

Sir, how were the scores generated?

Aswin Krishna M Mon, 11/02/2020 - 13:49 Permalink

Hello Aswin. The scores are generated by TextBlob's sentiment analysis module. For more info please visit the primary dataset's page.

Rabindra Lamsal Wed, 11/04/2020 - 04:54 Permalink

Sir, do these tweets are unique or have retweets?

GONGATI REDDY Thu, 11/19/2020 - 10:35 Permalink

Tweets in this dataset are unique because the retweets have NULL geo and place objects.

Rabindra Lamsal Sat, 11/21/2020 - 04:48 Permalink

Sir, how to get tweets daily from india, especially from states/cities? I am using geocode option in python but it is not responding.. Also in R this error is coming ::Warning message in doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :="100 tweets were requested but the API can only return 0"

GONGATI REDDY Tue, 12/29/2020 - 06:34 Permalink

Please refer to my comment posted below.

Rabindra Lamsal Sat, 01/02/2021 - 07:16 Permalink

How to get tweets from cities like delhi, mumbai, kolkatta, hyderbad, bangalore directly from R or Python about covid daily?

GONGATI REDDY Tue, 12/29/2020 - 06:59 Permalink

Once the IDs are hydrated, you can filter out tweets as per your preference (use any spreadsheets). Or if you are comfortable with some level of programming, you can apply conditions to the "place" Twitter Geo Object: Eg, tweet["place"]["full_name"] == "New Delhi, India" or tweet["place"]["full_name"] == "Mumbai, India" and so no. I hope this helps.

Rabindra Lamsal Mon, 01/04/2021 - 06:25 Permalink

Sir how to fetch tweets from the tweet_id?

Bhavya Arya Thu, 12/12/2024 - 10:56 Permalink