Text Mining
Please cite the following paper when using this dataset:
Vanessa Su and Nirmalya Thakur, “COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations”, Proceedings of the IEEE 15th Annual Computing and Communication Workshop and Conference 2025, Las Vegas, USA, Jan 06-08, 2025 (Paper accepted for publication, Preprint: https://arxiv.org/abs/2412.17180).
Abstract:
- Categories:
To download this dataset without purchasing an IEEE Dataport subscription, please visit: https://zenodo.org/records/13896353
Please cite the following paper when using this dataset:
- Categories:
To download the dataset without purchasing an IEEE Dataport subscription, please visit: https://zenodo.org/records/13738598
Please cite the following paper when using this dataset:
- Categories:
The biographies_EN dataset contains 1000 biographies of literature writers retrieved from the english version of Wikipedia.
- Categories:
Several fields of study can benefit from a large, structured, and accurate dataset of historical figures. Due to a lack of such a dataset, in this paper, we aim to use machine learning and text mining models to collect, predict, and cleanse online data with a focus on age and gender. We developed a five-step method and inferred birth and death years, binary gender, and occupation from community-submitted data to all language versions of the Wikipedia project.
- Categories:
This dataset contains information about Android app users’ reviews crawled from https://play.google.com/store/apps from 2022/4/2 to 2022/4/14. User reviews of 24 Korean trading apps were collected from Google Play Store, and the total number of the collected reviews is 41,705. App name, user ID, review content, rating, and date information were collected for each review by web crawling. The entire dataset is in Korean.
- Categories:
This dataset contains job and their skills extracted from the job adverisments.
- Categories:
The "RetroRevMatchEvalICIP16" dataset provides a retrospective reviewer recommendation dataset and evaluation for IEEE ICIP 2016. The methodology via which the recommendations were obtained and the evaluation was performed is described in the associated paper.
Y. Zhao, A. Anand, and G. Sharma, “Reviewer recommendations using document vector embeddings and a publisher database: Implementation and evaluation,” IEEE Access, vol. 10, pp. 21 798–21 811, 2022. https://doi.org/10.1109/ACCESS.2022.3151640
- Categories: