Machine Learning

The Wind Power Technology Dataset is a comprehensive collection of data related to wind energy generation technology. This dataset encompasses a wide range of information, including meteorological data, turbine specifications, power output records, and environmental factors. It provides a valuable resource for researchers, engineers, and stakeholders in the renewable energy sector.

Categories:
3486 Views

With the widespread use of the Portable Document Format (PDF), it’s increasingly becoming a target for malware, highlighting the need for effective detection solutions. In recent years, machine learning-based methods for PDF malware detection have grown in popularity. However, the effectiveness of ML models is closely related to the quality of the training datasets. In this research, we investigated two widely used PDF malware datasets: Contagio and CIC. We found biases and representativeness issues that could affect the reliability and applicability of models built on them.

Categories:
391 Views

Volkswagen Group of America Innovation and Engineering Center California (VW IECC) is a research facility in Belmont, California working on the future of the mobility. In the recent years exciting developments have happened for the autonomous vehicles. In general, lack of data is the main problem to tackle to solve the task of autonomous driving. One of the important tasks in this topic is the overtaking and lane changes, especially in the highway scenarios.

Categories:
340 Views

In the contemporary cybersecurity landscape, robust attack detection mechanisms are important for organizations. However, the current state of research in Software-Defined Networking (SDN) suffers from a notable lack of recent SDN-OpenFlow-based datasets. Here we introduce a novel dataset for intrusion detection in Software-Defined Networking named SDNFlow. The dataset, derived from OpenFlow statistics gathered from real traffic, integrates a comprehensive range of network activities.

Categories:
1249 Views

Accurate knowledge of key genes that promote hair follicle growth and development is of great value in the field of hair research and dermatology. Compared with the traditional time-consuming and laborious experimental methods for obtaining key genes, the literature mining method can extract proven key genes for hair follicle growth from the vast amount of literature more quickly and comprehensively, i.e., perform the tasks of Named Entity Recognition (NER) and Relationship Extraction (RE) of related entities.

Categories:
94 Views

The prognostic survival dataset, Pancreatic Cancer Survival based on Preoperative Features (PCSPF), was constructed to explore the impact of key preoperative features on prognosis based on the follow-up data of patients with pancreatic cancer at Changhai Hospital, Shanghai, China.

Categories:
700 Views

# Datasets for stage 3

The datasets were collected from a software-based simulation environment simulating a small-scale IEC 61850-compliant substation with both the primary plant and the process bus.

The datasets consist of 148 attack scenarios, each scenario includes two benign behaviours (fault-free behaviours and emergency behaviours) and one type of malicious behaviour. 

Categories:
79 Views

Accurate prediction of protein-ligand binding affinities (PLAs) is essential for drug discovery, repositioning, and design.

Categories:
35 Views

Captcha stands for Completely Automated Public Turing Tests to Distinguish Between Humans and Computers. This test cannot be successfully completed by current computer systems; only humans can. It is applied in several contexts for machine and human identification. The most common kind found on websites are text-based CAPTCHAs.A CAPTCHA is made up of a series of alphabets or numbers that are linked together in a certain order.

Categories:
340 Views

In medical applications, machine learning often grapples with limited training data. Classical self-supervised deep learning techniques have been helpful in this domain, but these algorithms have yet to achieve the required accuracy for medical use. Recently quantum algorithms show promise in handling complex patterns with small datasets. To address this challenge, this study presents a novel solution that combines self-supervised learning with Variational Quantum Classifiers (VQC) and utilizes Principal Component Analysis (PCA) as the dimensionality reduction technique.

Categories:
117 Views

Pages