The goal of this project is to leverage Amazon Web Service's machine learning services to create a dataset that automatically adds and updates files on IEEE DataPort's S3 storage. Through this process, we sought to learn and demonstrate how an ongoing data collection script can create a shared living dataset by streaming data to our IEEE DataPort dataset storage. In the process, we also hoped to gain further insights into areas including:
Several fields of study can benefit from a large, structured, and accurate dataset of historical figures. Due to a lack of such a dataset, in this paper, we aim to use machine learning and text mining models to collect, predict, and cleanse online data with a focus on age and gender. We developed a five-step method and inferred birth and death years, binary gender, and occupation from community-submitted data to all language versions of the Wikipedia project.