CSV
To provide machine learning and data science experts with a more robust dataset for model training, the well-known Palmer Penguins dataset has been expanded from its original 344 rows to 100,000 rows. This substantial increase was achieved using an adversarial random forest technique, effectively generating additional synthetic data while maintaining key patterns and features. The method achieved an impressive accuracy of 88%, ensuring the expanded dataset remains realistic and suitable for classification tasks.
- Categories:
This dataset contains original and augmented versions of the Korean Call Content Vishing (KorCCVi v2) dataset used in the study titled, "Enhancing Voice Phishing Detection Using Multilingual Back-Translation and SMOTE: An Empirical Study." The dataset addresses challenges of data imbalance and asymmetry in Korean voice phishing detection, leveraging data augmentation techniques such as multilingual back-translation (BT) with English, Chinese, and Japanese as intermediate languages, and Synthetic Minority Oversampling Technique (SMOTE).
- Categories:
This dataset contains the impedance and frequency response test results of commercial LED lamp beads of different brands in multiple configurations. The dataset is divided into two parts: one is a single test of LED lamp beads of different brands, covering their impedance and frequency response characteristics; the other is a test of LED lamp beads with different array configurations (such as 1x3, 4x4, 2x1, etc.) to show their performance in the array structure.
- Categories:
This repository contains the datasets produced using different data generation strategies to train data driven models (e.g., decision trees, gradient tree boosting, and deep neural networks), and to evaluate their performances. The data generation strategies are described, and the results are presented in the conference paper: "Training Data Generation Strategies for Data-driven Security Assessment of Low Voltage Smart Grids" J. Cuenca, E. Aldea, E. Le Guern-Dall'o, R. Féraud, G. Camilleri, and A. Blavette. IEEE ISGT EU 2024, Dubrovnik, Croatia, Oct 2024.
- Categories:
The dataset derived from the European Table of Frequency Allocations (ECA Table) represents a comprehensive compilation of frequency ranges and their associated bandwidths allocated for various applications across the electromagnetic spectrum, spanning from 8.3 kHz to 3000 GHz. This dataset is of interest to gain an understanding the distribution of frequency allocations and bandwidth usage in a regulatory framework, aiding in spectrum management and planning.
- Categories:
Despite the considerable efforts to enhance road infrastructure and enforce stricter driving regulations to ensure road safety, the number of accidents worldwide remains alarmingly high, driven by factors such as distracted driving, speeding, and impaired driving. For instance, in the United States, fatal accidents increased by 16% from 2018 to 2022, with the number of fatalities rising from 36,835 in 2018 to 42,795 in 2022. This highlights the pressing need for innovative solutions to mitigate traffic incidents and enhance road safety.
- Categories:
As the world increasingly becomes more interconnected, the demand for safety and security is ever-increasing, particularly for industrial networks. This has prompted numerous researchers to investigate different methodologies and techniques suitable for intrusion detection systems (IDS) requirements. Over the years, many studies have proposed various solutions in this regard, including signature-based and machine learning (ML)-based systems. More recently, researchers are considering deep learning (DL)-based anomaly detection approaches.
- Categories:
Missing values in the dataset were denoted as 999999.0. After replacing 999999.0 with NAN, it was found that the Zhaogezhuang well had 4522 missing values, and the Yutian Ji 03 well had 2076 missing values. Linear interpolation was used to fill these missing values. The datasets after filling are shown in Figures. The red dashed line in the figure indicates the dates that separate the seismically Active (SA) and seismically inactive (non-SA) periods.
- Categories:
Arc faults are a significant cause of failure in photovoltaic (PV) system and can arise due to component deterioration, installation problems, rodents chewing on wires, abrasion of insulation, or other root causes. Undetected, incipient arc faults can propagate into electrical fires. Consequently, arc-fault detectors, now mandated in many jurisdictions, are essential for safe operation of PV systems.
- Categories: