Rabindra Lamsal

Rabindra Lamsal's picture
First Name: 
Last Name: 
Jawaharlal Nehru University, New Delhi
Job Title: 
Graduate Research Scholar
Machine Learning, NLP, Social Media Analytics

Datasets & Analysis

This dataset contains the IDs of geo-tagged tweets. The tweets were captured by an on-going project deployed at https://live.rlamsal.com.np. The geolocation data was extracted from the tweets which mentioned anything about "corona", "covid-19", "coronavirus" or the variants of "sars-cov-2". Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs.

  • COVID-19
  • Last Updated On: 
    Sun, 05/31/2020 - 00:49

    Considering the ongoing works in Natural Language Processing (NLP) with Nepali language, it is evident that the use of Artificial Intelligence and NLP on this Devanagari script has still a long way to go. The Nepali language is complex in itself and requires multi-dimensional approaches for pre-processing the unstructured text and training the machines to comprehend the language competently. There seemed a need for a comprehensive Nepali language text corpus containing texts from domains such as News, Finance, Sports, Entertainment, Health, Literature, Technology.

  • Standards Research Data
  • Last Updated On: 
    Sun, 04/19/2020 - 23:12

    This dataset includes CSV files that contain tweet IDs. The tweets have been collected by the model deployed here at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for coronavirus-related tweets, using filters: language “en”, and keywords “corona”, "coronavirus", "covid", "covid19" and variants of "sarscov2".

  • COVID-19
  • Last Updated On: 
    Sun, 05/31/2020 - 00:46

    Each database (*.db) contain three columns. First column: date and time of the tweet, second column: tweet, third column: sentiment score for the particular tweet within the range [-1,1] with -1 being the most negative, 0 being the neutral and +1 being the most positive sentiment. The tweets have been collected by the model deployed here at sentiment.live [1]. The last column, viz. sentiment score, is not the score estimated by the model. The model is still in the pre-alpha phase.

  • Artificial Intelligence
  • Last Updated On: 
    Fri, 05/01/2020 - 08:55

    This pre-trained Word2Vec model has 300-dimensional vectors for more than 0.5 million Nepali words and phrases. A separate Nepali language text corpus was created using the news contents freely available in the public domain. The text corpus contained more than 90 million running words.

    Word2Vec model details: Embeddings Dimension: 300, Architecture: Continuous - BOW, Training algorithm: Negative sampling = 15, Context (window) size: 10, Token minimum count: 2, Encoded in UTF-8.

  • Computational Intelligence
  • Last Updated On: 
    Sun, 03/15/2020 - 07:44