Rabindra Lamsal's picture
First Name: 
Rabindra
Last Name: 
Lamsal
Affiliation: 
School of Computing and Information Systems, University of Melbourne
Job Title: 
Ph.D. Candidate
Expertise: 
Machine Learning, Natural Language Processing, Social Computing
Short Bio: 
I'm a Ph.D. Candidate at the School of Computing and Information Systems, University of Melbourne. I completed my BE in Computer Engineering from the Department of Computer Science & Engineering, Kathmandu University (2012-16), and M.Tech from the School of Computer and Systems Sciences, Jawaharlal Nehru University (2017-19). I was also associated with the Special Centre for Disaster Research, Jawaharlal Nehru University, as a Project associate from 2018-19. My areas of research interest are Machine Learning, Natural Language Processing, and Social Computing.

Datasets & Competitions

This India-specific COVID-19 tweets dataset has been developed using the large-scale Coronavirus (COVID-19) Tweets Dataset, which currently contains more than 700 million COVID-19-specific English language tweets. This dataset contains tweets originating from India during the first week of each four phases of nationwide lockdowns initiated by the Government of India.

Categories:
4233 Views

This dataset gives a cursory glimpse at the overall sentiment trend of the public discourse regarding the COVID-19 pandemic on Twitter. The live scatter plot of this dataset is available as The Overall Trend block at https://live.rlamsal.com.np. The trend graph reveals multiple peaks and drops that need further analysis. The n-grams during those peaks and drops can prove beneficial for better understanding the discourse.

Categories:
5534 Views

This dataset contains IDs and sentiment scores of geo-tagged tweets related to the COVID-19 pandemic. The real-time Twitter feed is monitored for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. Complying with Twitter's content redistribution policy, only the tweet IDs are shared. The tweet IDs in this dataset belong to the tweets created providing an exact location.

Categories:
36685 Views

Considering the ongoing works in Natural Language Processing (NLP) with the Nepali language, it is evident that the use of Artificial Intelligence and NLP on this Devanagari script has still a long way to go. The Nepali language is complex in itself and requires multi-dimensional approaches for pre-processing the unstructured text and training the machines to comprehend the language competently. There seemed a need for a comprehensive Nepali language text corpus containing texts from domains such as News, Finance, Sports, Entertainment, Health, Literature, Technology.

Categories:
3918 Views

This dataset includes CSV files that contain IDs and sentiment scores of the tweets related to the COVID-19 pandemic. The real-time Twitter feed is monitored for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. The oldest tweets in this dataset date back to October 01, 2019. This dataset has been wholly re-designed on March 20, 2020, to comply with the content redistribution policy set by Twitter.

Categories:
144709 Views

This dataset page is currently being updated. The tweets collected by the model deployed at https://live.rlamsal.com.np/ are shared here. However, because of COVID-19, all computing resources I have are being used for a dedicated collection of the tweets related to the pandemic. You can go through the following datasets to access those tweets:

Categories:
7636 Views

This pre-trained Word2Vec model has 300-dimensional vectors for more than 0.5 million Nepali words and phrases. A separate Nepali language text corpus was created using the news contents freely available in the public domain. The text corpus contained more than 90 million running words. The "Nepali Text Corpus" can be accessed freely from http://dx.doi.org/10.21227/jxrd-d245.

Categories:
2139 Views