big data
Social Media Big Dataset for Research, Analytics, Prediction, and Understanding the Global Climate Change Trends is focused on understanding the climate science, trends, and public awareness of climate change. The use of dataset for analytics of climate change trends greatly helps in researching and comprehending global climate change trends.
- Categories:
This data collection focuses on capturing user-generated content from the popular social network Reddit during the year 2023. This dataset comprises 29 user-friendly CSV files collected from Reddit, containing textual data associated with various emotions and related concepts.
- Categories:
Please cite the following paper when using this dataset:
N. Thakur, K. Khanna, S. Cui, N. Azizi, and Z. Liu, “Mining and Analysis of Search Interests related to Online Learning Platforms from Different Countries since the Beginning of COVID-19” [Unpublished Paper - Paper submitted to HCI International 2023, Copenhagen, Denmark, 23-28 July 2023]
Brief Description of Dataset file - Interest_Dataset.csv:
Attribute Name: Week
- Categories:
Please cite the following paper when using this dataset:
N. Thakur, K. Khanna, S. Cui, N. Azizi, and Z. Liu, “Mining and Analysis of Search Interests related to Online Learning Platforms from Different Countries since the Beginning of COVID-19” [Unpublished Paper - Paper submitted to HCI International 2023, Copenhagen, Denmark, 23-28 July 2023]
Brief Description of Dataset file - Interest_Dataset.csv:
Attribute Name: Week
- Categories:
Please cite the following paper when using this dataset:
N. Thakur, K. Khanna, S. Cui, N. Azizi, and Z. Liu, “Mining and Analysis of Search Interests related to Online Learning Platforms from Different Countries since the Beginning of COVID-19”, Proceedings of the 25th International Conference on Human-Computer Interaction (HCII 2023), Copenhagen, Denmark, July 23-28, 2023 (Accepted for Publication)
Brief Description of Dataset file - Interest_Dataset.csv:
Attribute Name: Week
- Categories:
PT7 Web is an annotated Portuguese language Corpus built from samples collected from Sep 2018 to Mar 2020 from seven Portuguese-speaking countries: Angola, Brazil, Portugal, Cape Verde, Guinea-Bissau, Macao e Mozambique. The records were filtered from Common Crawl — a public domain petabyte-scale dataset of webpages in many languages, mixed together in temporal snapshots of the web, monthly available [1]. The Brazilian pages were labeled as the positive class and the others as the negative class (non-Brazillian Portuguese).
- Categories:

We obtained 6 million instances to be used as an analysis for modelling CO2 behavior. The Data Logging and sensors nodes acquisition are every 1 second.
- Categories:

This dataset includes gathering 18-month raw PV data at time intervals of about 200 µs (5 kHz sampling). A post-processing 365-day day-by-day downsampled version, converted to 10 ms intervals (100 Hz sampling), is also included. The end results are two databases: 1. The original, raw, data, including both fast (short circuit, 200 µs) and slow (sweep, 2.5-3.9 s) information for 18 months. These show intervals of missing points, but are provided to allow potential users to reproduce any new work. 2.
- Categories: