Rabindra Lamsal

First Name

Rabindra

Last Name

Lamsal

Expertise

Machine Learning, Natural Language Processing, Social Computing

External links

Homepage

Langformers

Blog

Short Bio

Hi there, I'm Rabindra Lamsal. I hold a PhD from the School of Computing and Information Systems at the University of Melbourne. The Melbourne Research Scholarship fully funded my PhD. My thesis focused on designing advanced Artificial Intelligence (AI) methods, with key disciplines including Natural Language Processing (NLP), Pattern Recognition, Deep Learning, and Information Systems for sustainable systems and the public good.

Open Access Entries from this Author

MegaGeoCOV Extended

This dataset (MegaGeoCOV Extended), which is an extended version of MegaGeoCOV, was introduced in this paper: A Twitter narrative of the COVID-19 pandemic in Australia (the paper will appear in proceedings of the 20th ISCRAM conference, Omaha, Nebraska, USA May 2023). Please refer to the paper for more details (e.g., keywords and hashtags used, descriptive statistics, etc.).

Categories:

BillionCOV: An Enriched Billion-scale Collection of COVID-19 tweets for Efficient Hydration

BillionCOV is a global billion-scale English-language COVID-19 tweets dataset with more than 1.4 billion tweets originating from 240 countries and territories between October 2019 and April 2022. This dataset has been curated by hydrating the 2 billion tweets present in COV19Tweets.

Categories:

Tweets Originating from India During COVID-19 Lockdowns

This India-specific COVID-19 tweets dataset has been curated using the large-scale Coronavirus (COVID-19) Tweets Dataset. This dataset contains tweets originating from India during the first week of each of the four phases of nationwide lockdowns initiated by the Government of India. For more information on filtering keywords, please visit the primary dataset page.

Announcements:

Categories:

Coronavirus (COVID-19) Tweets Sentiment Trend

This dataset gives a cursory glimpse at the overall sentiment trend of the public discourse regarding the COVID-19 pandemic on Twitter. The live scatter plot of this dataset is available as The Overall Trend block at https://live.rlamsal.com.np. The trend graph reveals multiple peaks and drops that need further analysis. The n-grams during those peaks and drops can prove beneficial for better understanding the discourse.

Categories:

Coronavirus (COVID-19) Geo-tagged Tweets Dataset

This dataset (GeoCOV19Tweets) contains IDs and sentiment scores of geo-tagged tweets related to the COVID-19 pandemic. The real-time Twitter feed is monitored for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. Complying with Twitter's content redistribution policy, only the tweet IDs are shared. The tweet IDs in this dataset belong to the tweets created providing an exact location.

Categories:

A Large Scale Nepali Text Corpus

Considering the ongoing works in Natural Language Processing (NLP) with the Nepali language, it is evident that the use of Artificial Intelligence and NLP on this Devanagari script has still a long way to go. The Nepali language is complex in itself and requires multi-dimensional approaches for pre-processing the unstructured text and training the machines to comprehend the language competently. There seemed a need for a comprehensive Nepali language text corpus containing texts from domains such as News, Finance, Sports, Entertainment, Health, Literature, Technology.

Categories:

Coronavirus (COVID-19) Tweets Dataset

This dataset (COV19Tweets) includes CSV files that contain IDs and sentiment scores of the tweets related to the COVID-19 pandemic. The real-time Twitter feed is monitored for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. The oldest tweets in this dataset date back to October 01, 2019. This dataset has been wholly re-designed on March 20, 2020, to comply with the content redistribution policy set by Twitter.

Categories:

300-Dimensional Word Embeddings for Nepali Language

This pre-trained Word2Vec model has 300-dimensional vectors for more than 0.5 million Nepali words and phrases. A separate Nepali language text corpus was created using the news contents freely available in the public domain. The text corpus contained more than 90 million running words. The "Nepali Text Corpus" can be accessed freely from http://dx.doi.org/10.21227/jxrd-d245.

Categories:

Dataset Entries from this Author

Twitter Sentiment Analysis Data

This dataset page is currently being updated. The tweets collected by the model deployed at https://live.rlamsal.com.np/ are shared here. However, because of COVID-19, all computing resources I have are being used for a dedicated collection of the tweets related to the pandemic. You can go through the following datasets to access those tweets:

Categories:

Open Access Entries from this Author

MegaGeoCOV Extended

Category

BillionCOV: An Enriched Billion-scale Collection of COVID-19 tweets for Efficient Hydration

Category

Tweets Originating from India During COVID-19 Lockdowns

Category

Coronavirus (COVID-19) Tweets Sentiment Trend

Category

Coronavirus (COVID-19) Geo-tagged Tweets Dataset

Category

A Large Scale Nepali Text Corpus

Category

Coronavirus (COVID-19) Tweets Dataset

Category

300-Dimensional Word Embeddings for Nepali Language

Category

Dataset Entries from this Author

Twitter Sentiment Analysis Data