Real name: 
Congratulations! You are an IEEE DataPort Subscriber! Your benefits as a subscriber include full access to all IEEE DataPort datasets. You may also upload your own datasets to IEEE DataPort (up to 2TB) and/or consider initiating a Competition on IEEE DataPort.  Thank you!
First Name: 
Jairson
Last Name: 
Rodrigues

Datasets & Competitions

PT7 Web is an annotated Portuguese language Corpus built from samples collected from Sep 2018 to Mar 2020 from seven Portuguese-speaking countries: Angola, Brazil, Portugal, Cape Verde, Guinea-Bissau, Macao e Mozambique. The records were filtered from Common Crawl — a public domain petabyte-scale dataset of webpages in many languages, mixed together in temporal snapshots of the web, monthly available [1]. The Brazilian pages were labeled as the positive class and the others as the negative class (non-Brazillian Portuguese).

Categories:
562 Views

We introduce a benchmark of distributed algorithms execution over big data. The datasets are composed of metrics about the computational impact (resource usage) of eleven well-known machine learning techniques on a real computational cluster regarding system resource agnostic indicators: CPU consumption, memory usage, operating system processes load, net traffic, and I/O operations. The metrics were collected every five seconds for each algorithm on five different data volume scales, totaling 275 distinct datasets.

Categories:
1831 Views