big data
PT7 Web is an annotated Portuguese language Corpus built from samples collected from Sep 2018 to Mar 2020 from seven Portuguese-speaking countries: Angola, Brazil, Portugal, Cape Verde, Guinea-Bissau, Macao e Mozambique. The records were filtered from Common Crawl — a public domain petabyte-scale dataset of webpages in many languages, mixed together in temporal snapshots of the web, monthly available [1]. The Brazilian pages were labeled as the positive class and the others as the negative class (non-Brazillian Portuguese).
- Categories:
We obtained 6 million instances to be used as an analysis for modelling CO2 behavior. The Data Logging and sensors nodes acquisition are every 1 second.
- Categories:
This dataset includes gathering 18-month raw PV data at time intervals of about 200 µs (5 kHz sampling). A post-processing 365-day day-by-day downsampled version, converted to 10 ms intervals (100 Hz sampling), is also included. The end results are two databases: 1. The original, raw, data, including both fast (short circuit, 200 µs) and slow (sweep, 2.5-3.9 s) information for 18 months. These show intervals of missing points, but are provided to allow potential users to reproduce any new work. 2.
- Categories: