difficulty;nlp;simplification;classification

Nepaliliinguistic

Our dataset, which is Nepali news dataset, contains 17 categories, including Art, Bank, Blog, Business, Diaspora, Entertainment, Filmy, Health, Hollywood-bollywood, Koseli, Literature, Music, National, Opinion, Society, Sports, and World.

If you use this dataset, please cite our paper.

Sitaula C, Basnet A, Aryal S. 2021. Vector representation based on a supervised codebook for Nepali documents classification. PeerJ Computer Science 7:e412 https://doi.org/10.7717/peerj-cs.412

Categories:: Artificial Intelligence

375 Views

Dataset for Word Difficulty Prediction

Most text-simplification systems require an indicator of the complexity of the words. The prevalent approaches to word difficulty prediction are based on manual feature engineering. Using deep learning based models are largely left unexplored due to their comparatively poor performance. We have explored the use of one of such in predicting the difficulty of words. We have treated the problem as a binary classification problem. We have trained traditional machine learning models and evaluated their performance on the task.

Categories:: Artificial Intelligence
Machine Learning
Computational Intelligence
Other

2770 Views