Most text-simplification systems require an indicator of the complexity of the words. The prevalent approaches to word difficulty prediction are based on manual feature engineering. Using deep learning based models are largely left unexplored due to their comparatively poor performance. We have explored the use of one of such in predicting the difficulty of words. We have treated the problem as a binary classification problem. We have trained traditional machine learning models and evaluated their performance on the task.

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Subscribe now or login.

[1] Avishek Garain, Arpan Basu, Sudip Kumar Naskar, "Dataset for Word Difficulty Prediction", IEEE Dataport, 2020. [Online]. Available: http://dx.doi.org/10.21227/w0av-f618. Accessed: Feb. 08, 2025.
@data{w0av-f618-20,
doi = {10.21227/w0av-f618},
url = {http://dx.doi.org/10.21227/w0av-f618},
author = {Avishek Garain; Arpan Basu; Sudip Kumar Naskar },
publisher = {IEEE Dataport},
title = {Dataset for Word Difficulty Prediction},
year = {2020} }
TY - DATA
T1 - Dataset for Word Difficulty Prediction
AU - Avishek Garain; Arpan Basu; Sudip Kumar Naskar
PY - 2020
PB - IEEE Dataport
UR - 10.21227/w0av-f618
ER -
Avishek Garain, Arpan Basu, Sudip Kumar Naskar. (2020). Dataset for Word Difficulty Prediction. IEEE Dataport. http://dx.doi.org/10.21227/w0av-f618
Avishek Garain, Arpan Basu, Sudip Kumar Naskar, 2020. Dataset for Word Difficulty Prediction. Available at: http://dx.doi.org/10.21227/w0av-f618.
Avishek Garain, Arpan Basu, Sudip Kumar Naskar. (2020). "Dataset for Word Difficulty Prediction." Web.
1. Avishek Garain, Arpan Basu, Sudip Kumar Naskar. Dataset for Word Difficulty Prediction [Internet]. IEEE Dataport; 2020. Available from : http://dx.doi.org/10.21227/w0av-f618
Avishek Garain, Arpan Basu, Sudip Kumar Naskar. "Dataset for Word Difficulty Prediction." doi: 10.21227/w0av-f618