big data analytics

This is a dataset that contains the testing results presented in the manuscript "Exploring the Potential of Offline LLMs in Data Science: A Study on Code Generation for Data Analysis", and it aims to assess offline LLMs' capabilities in code generation for data analytics tasks. Best utilization of the dataset would occur after thorough understanding of the manuscript. A total of 250 testing results were generated. They were merged, leading to the creation of this current dataset.

Categories:
17 Views

the dataset includes geospatial vector point and linestring data, and the data size ranges from 4 million records to 100 million records to evaluate the applicability of HiVQ.

Categories:
2 Views

The Surface Accelerations Reference is a catalog of all longitudinal and lateral accelerations experienced by SHRP2-NDS participants. The Strategic Highway Research Program Naturalistic Driving Study (SHRP2-NDS) is the largest naturalistic driving study in the world constituting of 34.5 million miles of recorded driving data. To create the surface accelerations reference, each and every acceleration event in SHRP2-NDS was detected, summarized, and recorded creating a database of more than 1.7 billion data points.

Categories:
188 Views

The dataset contains:
1. We conducted a A 24-hour recording of ADS-B signals at DAB on 1090 MHz with USRP B210 (8 MHz sample rate). In total, we got the signals from more than 130 aircraft.
2. An enhanced gr-adsb, in which each message's digital baseband (I/Q) signals and metadata (flight information) are recorded simultaneously. The output file path can be specified in the property panel of the ADS-B decoder submodule.
3. Our GnuRadio flow for signal reception.
4. Matlab code of the paper, wireless device identification using the zero-bias neural network.

Categories:
3619 Views

We obtained 6 million instances to be used as an analysis for modelling CO2 behavior. The Data Logging and sensors nodes acquisition are every 1 second.

Categories:
641 Views

We introduce a benchmark of distributed algorithms execution over big data. The datasets are composed of metrics about the computational impact (resource usage) of eleven well-known machine learning techniques on a real computational cluster regarding system resource agnostic indicators: CPU consumption, memory usage, operating system processes load, net traffic, and I/O operations. The metrics were collected every five seconds for each algorithm on five different data volume scales, totaling 275 distinct datasets.

Categories:
1829 Views

Dataset Ⅰ:To obtain the prices of parts from the manufacturing characteristics and other manufacturing processes, feature quantity expression is innovatively applied. By identifying manufacturing features and calculating the feature quantities, the feature quantities are described in the form of assignments as data. To obtain the prices of parts intelligently, the most widely used and mature deep-learning method is adopted to realize the accurate quotation of parts.

Categories:
194 Views