300-Dimensional Word Embeddings for Nepali Language

300-Dimensional Word Embeddings for Nepali Language

Citation Author(s):
Rabindra
Lamsal
Submitted by:
Rabindra Lamsal
Last updated:
Sun, 09/22/2019 - 10:59
DOI:
10.21227/dz6s-my90
Data Format:
Links:
License:
Dataset Views:
422
Share / Embed Cite

This pre-trained Word2Vec model has 300-dimensional vectors for more than 0.5 million Nepali words and phrases. A separate Nepali language text corpus was created using the news contents freely available in the public domain. The text corpus contained more than 100 million running words.

Word2Vec model details: Embeddings Dimension: 300, Architecture: Continuous - BOW, Training algorithm: Negative sampling = 15, Context (window) size: 10, Token minimum count: 2, Encoded in UTF-8.

Instructions: 

from gensim.models import KeyedVectors

# Load vectors
model = KeyedVectors.load_word2vec_format(''.../path/to/nepali_embeddings_word2vec.txt', binary=False)

# find similarity between words
model.similarity('फेसबूक','इन्स्टाग्राम')

#most similar words
model.most_similar('ठमेल')

#try some linear algebra maths with Nepali words
model.most_similar(positive=['', ''], negative=[''], topn=1)

The design of the Nepali text corpus and the training of the Word2Vec model was done at Database Systems and Artificial Intelligence Lab, School of Computer and System Sciences, Jawaharlal Nehru University, New Delhi.

Dataset Files

You must login with an IEEE Account to access these files. IEEE Accounts are FREE.

Sign Up now or login.

Documentation

AttachmentSize
PDF icon Readme file141.45 KB

Embed this dataset on another website

Copy and paste the HTML code below to embed your dataset:

Share via email or social media

Click the buttons below:

facebooktwittermailshare
[1] , "300-Dimensional Word Embeddings for Nepali Language", IEEE Dataport, 2019. [Online]. Available: http://dx.doi.org/10.21227/dz6s-my90. Accessed: Oct. 14, 2019.
@data{dz6s-my90-19,
doi = {10.21227/dz6s-my90},
url = {http://dx.doi.org/10.21227/dz6s-my90},
author = { },
publisher = {IEEE Dataport},
title = {300-Dimensional Word Embeddings for Nepali Language},
year = {2019} }
TY - DATA
T1 - 300-Dimensional Word Embeddings for Nepali Language
AU -
PY - 2019
PB - IEEE Dataport
UR - 10.21227/dz6s-my90
ER -
. (2019). 300-Dimensional Word Embeddings for Nepali Language. IEEE Dataport. http://dx.doi.org/10.21227/dz6s-my90
, 2019. 300-Dimensional Word Embeddings for Nepali Language. Available at: http://dx.doi.org/10.21227/dz6s-my90.
. (2019). "300-Dimensional Word Embeddings for Nepali Language." Web.
1. . 300-Dimensional Word Embeddings for Nepali Language [Internet]. IEEE Dataport; 2019. Available from : http://dx.doi.org/10.21227/dz6s-my90
. "300-Dimensional Word Embeddings for Nepali Language." doi: 10.21227/dz6s-my90