Datasets
Standard Dataset
Topics modeling in computer science articles
- Citation Author(s):
- Submitted by:
- Jose Melendez
- Last updated:
- Sat, 09/05/2020 - 09:11
- DOI:
- 10.21227/7exb-wb55
- Data Format:
- Links:
- License:
- Categories:
- Keywords:
Abstract
By querying open data of notorious scientific databases via representational state transfers, and subsequently enforcing data management practices with a dynamic topic modeling approach on the referred metadata available, this work achieves a feasible form of article set analysis and classification. Research trends for a given field in specific moments are identified, and also the referred trends evolution throughout the years. It is then possible to detect the associated lexical variation overtime on published content, ultimately determining the so-called hot topics in arbitrary instants, including now. Three prominent scientific articles databases are probed by this work, they are arXiv, IEEExplore, and Springer Nature.
The dataset contains:
Identification of the articles used in the study
The proportion of the topics in each document
Number of articles per year per topic
Distribution of the words that make up each topic
Instructions and documentation are given in readme.pdf.
Dataset Files
- Identification of the articles used in the study complete-list-articles-using-in-study.csv (193.06 MB)
- Articles + abstracts used in the study arx_ieee_articles-cleaned-using-in-study.csv (737.53 MB)
- Distribution of the topics in each document distribution-of-topics-in-documents.csv (978.98 MB)
- Number of articles per year per topic qty_articles_year_topic.csv (3.96 kB)
- Distribution of the words that make up each topic top 50 distribution-words-in-topics-top50.csv (418.99 kB)
- Distribution of the words that make up each topic top 100 distribution-words-in-topics-top100.csv (853.12 kB)
- Distribution of the words that make up each topic top 200 distribution-words-in-topics-top200.csv (1.69 MB)
Documentation
Attachment | Size |
---|---|
readme.pdf | 27.59 KB |
Comments
Great