300-Dimensional Word Embeddings for Nepali Language

This pre-trained Word2Vec model has 300-dimensional vectors for more than 0.5 million Nepali words and phrases. A separate Nepali language text corpus was created using the news contents freely available in the public domain. The text corpus contained more than 100 million running words.

Word2Vec model details: Embeddings Dimension: 300, Architecture: Continuous - BOW, Training algorithm: Negative sampling = 15, Context (window) size: 10, Token minimum count: 2, Encoded in UTF-8.

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Subscribe now or login.

Documentation: 
[1] Rabindra Lamsal, "300-Dimensional Word Embeddings for Nepali Language", IEEE Dataport, 2019. [Online]. Available: http://dx.doi.org/10.21227/dz6s-my90. Accessed: Dec. 09, 2019.
@data{dz6s-my90-19,
doi = {10.21227/dz6s-my90},
url = {http://dx.doi.org/10.21227/dz6s-my90},
author = {Rabindra Lamsal },
publisher = {IEEE Dataport},
title = {300-Dimensional Word Embeddings for Nepali Language},
year = {2019} }
TY - DATA
T1 - 300-Dimensional Word Embeddings for Nepali Language
AU - Rabindra Lamsal
PY - 2019
PB - IEEE Dataport
UR - 10.21227/dz6s-my90
ER -
Rabindra Lamsal. (2019). 300-Dimensional Word Embeddings for Nepali Language. IEEE Dataport. http://dx.doi.org/10.21227/dz6s-my90
Rabindra Lamsal, 2019. 300-Dimensional Word Embeddings for Nepali Language. Available at: http://dx.doi.org/10.21227/dz6s-my90.
Rabindra Lamsal. (2019). "300-Dimensional Word Embeddings for Nepali Language." Web.
1. Rabindra Lamsal. 300-Dimensional Word Embeddings for Nepali Language [Internet]. IEEE Dataport; 2019. Available from : http://dx.doi.org/10.21227/dz6s-my90
Rabindra Lamsal. "300-Dimensional Word Embeddings for Nepali Language." doi: 10.21227/dz6s-my90