Krapivin

0
0 ratings - Please login to submit your rating.

Abstract 

In this paper we use Natural Language Processing techniques to improve different machine learning approaches (Support Vector Machines (SVM), Local SVM, Random Forests) to the problem of automatic keyphrases extraction from scientific papers. For the evaluation we propose a large and high-quality dataset: 2000 ACM papers from the Computer Science domain. We evaluate by comparison with expert-assigned keyphrases. Evaluation shows promising results that outperform state-of-the-art Bayesian learning system KEA improving the average F-Measure from 22% (KEA) to 30% (Random Forest) on the same dataset without the use of controlled vocabularies. Finally, we report a detailed analysis of the effect of the individual NLP features and data set size on the overall quality of extracted keyphrases.

Instructions: 

A dataset for benchmarking keyphrase extraction and generation techniques from long document english scientific papers. For more details about the dataset please refer the original paper - https://www.semanticscholar.org/paper/Large-Dataset-for-Keyphrases-Extraction-Krapivin-Autaeu/2c56421ff3c2a69894d28b09a656b7157df8eb83 Original source of the data

Data Descriptor Article DOI: