Artificial Intelligence
This dataset utilizes Asus RT-AC86U routers and nexmon tools to collect Channel State Information (CSI) data in a 7 by 5 meters meeting room furnished with typical furniture including a conference table, several chairs, and a locker. The data, stored in .pcap format, is accompanied by processing code on GitHub, enabling parsing into CSI matrix data stored in .npy format. Each CSI matrix contains amplitude and processed phase values for four channels, encompassing data from both external and internal antennas within the room.
- Categories:
SETCD (Satellite and ERA5-based Tropical Cyclone Dataset), a comprehensive dataset encompassing satellite imagery data and ERA5 data for all TCs recorded between 1980 and 2022. Our dataset is derived from two publicly available data sources: GridSat-B1 and ERA5. To capture relevant information associated with TC, SETCD adopts the latitude and longitude positions provided by IBTrACS as the center points. The satellite data within the SETCD dataset consists of three channels from GridSat-B1: infrared, water vapor, and visible.
- Categories:
KPI prediction, which is categorized under time series data modeling, serves as a crucial area of investigation within the realm of complex industrial processes. This field focuses on forecasting key performance indicators that are pivotal for assessing the operational efficiency and productivity of industries. By leveraging historical data trends, KPI prediction aids in optimizing process controls and decision-making strategies, thus enhancing overall performance and competitive edge.
- Categories:
The WPT dataset was specially created for "Web Page Tampering Detection Based on Dynamic Temporal Graph Pre-training" and encompasses over 200,000 regular web pages from 75 websites across the finance, healthcare, and education sectors, in addition to 1,541 tampered examples sourced from zone-h.org. This dataset organizes web pages as nodes and their links as edges within a discrete dynamic graph structure, capturing snapshots at various moments in time. Each node integrates structural, textual, and statistical features into a robust 148-dimensional feature vector for every page.
- Categories:
Weibo and Twitter
1)The Weibo dataset is derived from the Weibo social platform. The collection of true information in this dataset originates from authoritative Chinese sources, while fake information is acquired through the official Weibo rumor suppression system. Each data instance within this dataset comprises both a news text and a corresponding news image.
- Categories:
The JKU-ITS AVDM contains data from 17 participants performing different tasks with various levels of distraction.
The data collection was carried out in accordance with the relevant guidelines and regulations and informed consent was obtained from all participants.
The dataset was collected using the JKU-ITS research vehicle with automated capabilities under different illumination and weather conditions along a secure test route within the
- Categories:
This is a dataset about minimizing maritime passenger transfer in ship routing. consists of data on the distance between ports, the number of passengers from the port of origin to the port of destination, ship speed, and the duration of berthing at ports.
- Categories:
This is a compressed package containing nine multi-label text classification data sets, including AAPD, CitySearch, Heritage, Laptop, Ohsumed, RCV1, Restaurant, Reuters, and Sentihood.
- Categories:
This data set has been collected from a custom built battery prognostics testbed at the NASA Ames Prognostics Center of Excellence (PCoE). Li-ion batteries were run through 3 different operational profiles (charge, discharge and Electrochemical Impedance Spectroscopy) at different temperatures. Discharges were carried out at different current load levels until the battery voltage fell to preset voltage thresholds. Some of these thresholds were lower than that recommended by the OEM (2.7 V) in order to induce deep discharge aging effects.
- Categories:
With the progress made in speaker-adaptive TTS approaches, advanced approaches have shown a remarkable capacity to reproduce the speaker’s voice in the commonly used TTS datasets. However, mimicking voices characterized by substantial accents, such as non-native English speakers, is still challenging. Regrettably, the absence of a dedicated TTS dataset for speakers with substantial accents inhibits the research and evaluation of speaker-adaptive TTS models under such conditions. To address this gap, we developed a corpus of non-native speakers' English utterances.
- Categories: