Machine Learning | IEEE DataPort

The mapping between problems, recovery attemps and their phases in software projects

We employed a case study research approach to gather the factors for troubled software projects from the existing literature to generate an innovative dataset. A comprehensive dataset that serves as a foundational reference for future investigations. We extracted incidents from case study data, generated open codes, and organized these open codes into 18 problem categories and 27 solution categories. The mapping between open codes, axial codes and phases is documented in dataset. The codes encapsulate the behavioral patterns or actions of a team that initiate or cause

Categories:

Machine Learning

ARImulti-mic: real-world speech recordings on a humanoid robot (ARI)

ARImulti-mic: real-world speech recordings on a humanoid robot (ARI)

This dataset includes “real-world” experiments. A recording campaign was held in the acoustic laboratory at Bar-Ilan University. This lab is a [6×6×2.4]m room with a reverberation time controlled by 60 interchangeable panels covering the room facets.

Categories:

Bayesian Network benchmark Datasets and mixed data

Contains the benchmark Bayesian network dataset, which uses the seed of Bayesian networks from https://www.bnlearn.com. Some of the data comes from https://pages.mtu.edu/~lebrown/supplements/mmhc_paper/mmhc_index.html. And other datasets from the UCI that contain mixed data. These data can be used to learn the basic structure of Bayesian networks, the research of cause-based feature selection algorithms, etc bnlearn is an R package for learning the graphical structure of Bayesian networks, estimating their parameters and performing some useful inference.

Categories:

SensorNetGuard: A Dataset for Identifying Malicious Sensor Nodes

The dataset, titled "SensorNetGuard: A Dataset for Identifying Malicious Sensor Nodes," comprises 10,000 samples with 21 features. It is designed to facilitate the identification of malicious sensor nodes in a network environment, specifically focusing on IoT-based sensor networks.

General Metrics

§ Node ID: The unique identifier for each node.

§ Timestamp: The time at which data or a packet is sent or received.

§ IP Address: Internet Protocol address of the node.

Network Traffic Metrics

Categories:

RITA: a Phraseological dataset of CEFR Assignments and Exams for Italian as a Second Language

RITA (Resource for Italian Tests Assessment), is a new NLP dataset of academic exam texts written in Italian by second-language learners for obtaining the CEFR certification of proficiency level.
RITA dataset is available for automatic processing in CSV and XML format, under an agreement of citation.

Categories:

SCVIC-TS-2022: Network intrusion data with original raw network packets

SCVIC-TS-2022: Network intrusion data with original raw network packets

Categories:

Discovering Mathematical Patterns Behind HIV-1 Genetic Recombination: a new methodology to identify viral features - Supplementary Information

This dataset contains the Supplementary Information of the article "Discovering Mathematical Patterns Behind HIV-1 Genetic Recombination: a new methodology to identify viral features" (Manuscript DOI: 10.1109/ACCESS.2023.3311752).

Categories:

SYPHAXAR Dataset

SYPHAXAR dataset is a dataset for Arabic text detection in the wild. It was collected from Tunisia in “Sfax” city, the second largest Tunisian city after the capital. A total of 3078 images were gathered through manual collection one by one, with each image energizing text detection challenges in nature according to real existing complexity of 15 different routes along with ring roads, intersections and roundabouts. These annotated images consist of more than 31000 objects, each of which is enclosed within a bounding box.

Categories:

Pre-Training Representations of Binary Code Using Contrastive Learning

Overview

The dataset under consideration is a comprehensive compilation of code snippets, function descriptions, and their respective binary representations aimed at fostering research in software engineering. It contains a variety of code functionalities and serves as a valuable resource for understanding the behavior and characteristics of C programs. This data is sourced from the AnghaBench repository, a well-documented collection of C programs available on GitHub.

Columns and Data Types

The dataset contains the following columns:

Categories:

CRPs Dataset of Ring Oscillator PUF

Physically unclonable functions (PUFs) are foundational components that offer a cost-efficient and promising solution for diverse security applications, including countering integrated circuit (IC) counterfeiting, generating secret keys, and enabling lightweight authentication. PUFs exploit semiconductor variations in ICs to derive inherent responses from imposed challenges, creating unique challenge-response pairs (CRPs) for individual devices. Analyzing PUF security is pivotal for identifying device vulnerabilities and ensuring response credibility.

Categories: