Jairson Barbosa Rodrigues, Germano Crispim Vasconcelos, Paulo Romero Martins Maciel

PT7 Web is an annotated Portuguese language Corpus built from samples collected from Sep 2018 to Mar 2020 from seven Portuguese-speaking countries: Angola, Brazil, Portugal, Cape Verde, Guinea-Bissau, Macao e Mozambique. The records were filtered from Common Crawl — a public domain petabyte-scale dataset of webpages in many languages, mixed together in temporal snapshots of the web, monthly available [1]. The Brazilian pages were labeled as the positive class and the others as the negative class (non-Brazillian Portuguese).

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Subscribe now or login.

[1] Jairson Rodrigues, Germano Vasconcelos, Paulo Maciel, "PT7 Web, an Annotated Portuguese Language Corpus", IEEE Dataport, 2020. [Online]. Available: http://dx.doi.org/10.21227/fhrm-n966. Accessed: Jul. 18, 2024.
@data{fhrm-n966-20,
doi = {10.21227/fhrm-n966},
url = {http://dx.doi.org/10.21227/fhrm-n966},
author = {Jairson Rodrigues; Germano Vasconcelos; Paulo Maciel },
publisher = {IEEE Dataport},
title = {PT7 Web, an Annotated Portuguese Language Corpus},
year = {2020} }
TY - DATA
T1 - PT7 Web, an Annotated Portuguese Language Corpus
AU - Jairson Rodrigues; Germano Vasconcelos; Paulo Maciel
PY - 2020
PB - IEEE Dataport
UR - 10.21227/fhrm-n966
ER -
Jairson Rodrigues, Germano Vasconcelos, Paulo Maciel. (2020). PT7 Web, an Annotated Portuguese Language Corpus. IEEE Dataport. http://dx.doi.org/10.21227/fhrm-n966
Jairson Rodrigues, Germano Vasconcelos, Paulo Maciel, 2020. PT7 Web, an Annotated Portuguese Language Corpus. Available at: http://dx.doi.org/10.21227/fhrm-n966.
Jairson Rodrigues, Germano Vasconcelos, Paulo Maciel. (2020). "PT7 Web, an Annotated Portuguese Language Corpus." Web.
1. Jairson Rodrigues, Germano Vasconcelos, Paulo Maciel. PT7 Web, an Annotated Portuguese Language Corpus [Internet]. IEEE Dataport; 2020. Available from : http://dx.doi.org/10.21227/fhrm-n966
Jairson Rodrigues, Germano Vasconcelos, Paulo Maciel. "PT7 Web, an Annotated Portuguese Language Corpus." doi: 10.21227/fhrm-n966