Rabindra Lamsal

Rabindra Lamsal's picture
First Name: 
Last Name: 
Jawaharlal Nehru University, New Delhi
Job Title: 
Graduate Research Scholar
Machine Learning, NLP Social Media Analytics

Datasets & Analysis

Considering the ongoing works in Natural Language Processing (NLP) with Nepali language, it is evident that the use of Artificial Intelligence and NLP on this Devanagari script has still a long way to go. The Nepali language is complex in itself and requires multi-dimensional approaches for pre-processing the unstructured text and training the machines to comprehend the language competently. There seemed a need for a comprehensive Nepali language text corpus containing texts from domains such as News, Finance, Sports, Entertainment, Health, Literature, Technology.

  • Standards Research Data
  • Last Updated On: 
    Fri, 03/27/2020 - 05:52

    Tweets Counter: 13,419,667

  • COVID-19
  • Last Updated On: 
    Sun, 03/29/2020 - 10:43

    Collect Corona Virus related Tweets from here.

    Each database (*.db) contain three columns. First column: date and time of the tweet, second column: tweet, third column: sentiment score for the particular tweet within the range [-1,1] with -1 being the most negative, 0 being the neutral and +1 being the most positive sentiment.

  • Artificial Intelligence
  • Last Updated On: 
    Fri, 03/13/2020 - 13:32

    This pre-trained Word2Vec model has 300-dimensional vectors for more than 0.5 million Nepali words and phrases. A separate Nepali language text corpus was created using the news contents freely available in the public domain. The text corpus contained more than 90 million running words.

    Word2Vec model details: Embeddings Dimension: 300, Architecture: Continuous - BOW, Training algorithm: Negative sampling = 15, Context (window) size: 10, Token minimum count: 2, Encoded in UTF-8.

  • Computational Intelligence
  • Last Updated On: 
    Sun, 03/15/2020 - 07:44