big data

PT7 Web, an Annotated Portuguese Language Corpus

PT7 Web is an annotated Portuguese language Corpus built from samples collected from Sep 2018 to Mar 2020 from seven Portuguese-speaking countries: Angola, Brazil, Portugal, Cape Verde, Guinea-Bissau, Macao e Mozambique. The records were filtered from Common Crawl — a public domain petabyte-scale dataset of webpages in many languages, mixed together in temporal snapshots of the web, monthly available [1]. The Brazilian pages were labeled as the positive class and the others as the negative class (non-Brazillian Portuguese).

Categories:: Cloud Computing
Machine Learning

579 Views

CO2 dataset

We obtained 6 million instances to be used as an analysis for modelling CO2 behavior. The Data Logging and sensors nodes acquisition are every 1 second.

Categories:: Artificial Intelligence
IoT
Machine Learning
Standards Research Data

677 Views

2020 IEEE GRSS Data Fusion Contest

The 2020 Data Fusion Contest, organized by the Image Analysis and Data Fusion Technical Committee (IADF TC) of the IEEE Geoscience and Remote Sensing Society (GRSS) and the Technical University of Munich, aims to promote research in large-scale land cover mapping based on weakly supervised learning from globally available multimodal satellite data. The task is to train a machine learning model for global land cover mapping based on weakly annotated samples.

Artificial Intelligence

Machine Learning

Image Fusion

Geoscience and Remote Sensing

Submitted On:

Tue, 12/10/2019 - 00:59

Last Updated On:

Mon, 01/25/2021 - 09:03

One Year Submillisecond Fast Solar Database

This dataset includes gathering 18-month raw PV data at time intervals of about 200 µs (5 kHz sampling). A post-processing 365-day day-by-day downsampled version, converted to 10 ms intervals (100 Hz sampling), is also included. The end results are two databases: 1. The original, raw, data, including both fast (short circuit, 200 µs) and slow (sweep, 2.5-3.9 s) information for 18 months. These show intervals of missing points, but are provided to allow potential users to reproduce any new work. 2.

Categories:: Energy
Power and Energy
Electric Utility
Smart Grid
Weather
Sensors
Signal Processing

1224 Views

big data

big data

PT7 Web, an Annotated Portuguese Language Corpus

CO2 dataset

2020 IEEE GRSS Data Fusion Contest

One Year Submillisecond Fast Solar Database

Pages