Topics modeling in computer science articles

Citation Author(s):: Jose
Melendez Barros

Universidade de São Paulo

Rosa V.
Encinas Quille

Universidade de São Paulo

Márcio
Barbado Júnior

Universidade de São Paulo

Pedro Luiz
Pizzigatti Corrêa

Universidade de São Paulo

Glauber
de Bona

Universidade de São Paulo

Marcos Antonio
Simplicio Jr

Universidade de São Paulo
Submitted by:: Jose Melendez
Last updated:: Sat, 09/05/2020 - 09:11
DOI:: 10.21227/7exb-wb55
Data Format:: *.csv
Links:: Interactive Jupyter Notebook
License:: Creative Commons Attribution

556 Views

Categories:: Machine Learning
Keywords:: Dynamic Topic Modeling, Text similarity, Clustering algorithms, data science

0 ratings - Please login to submit your rating.

ACCESS DATASET CITE

Abstract

By querying open data of notorious scientific databases via representational state transfers, and subsequently enforcing data management practices with a dynamic topic modeling approach on the referred metadata available, this work achieves a feasible form of article set analysis and classification. Research trends for a given field in specific moments are identified, and also the referred trends evolution throughout the years. It is then possible to detect the associated lexical variation overtime on published content, ultimately determining the so-called hot topics in arbitrary instants, including now. Three prominent scientific articles databases are probed by this work, they are arXiv, IEEExplore, and Springer Nature.

The dataset contains:
Identification of the articles used in the study
The proportion of the topics in each document
Number of articles per year per topic
Distribution of the words that make up each topic

Instructions:

Instructions and documentation are given in readme.pdf.

Comments

Great

Submitted by Rade Dacevic on Fri, 03/17/2023 - 05:57

Dataset Files

Documentation

Attachment	Size
readme.pdf	27.59 KB

Datasets

Standard Dataset

Topics modeling in computer science articles

Abstract

Comments

More from this Author

Portuguese Aspect Sentiment Triplet Extraction Datasets

Dataset Files

Documentation

QUESTIONS?