Machine Learning

Malware Analysis Datasets: API Call Sequences

This dataset is part of our research on malware detection and classification using Deep Learning. It contains 42,797 malware API call sequences and 1,079 goodware API call sequences. Each API call sequence is composed of the first 100 non-repeated consecutive API calls associated with the parent process, extracted from the 'calls' elements of Cuckoo Sandbox reports.

Categories:: Machine Learning
Security

8064 Views

ICASSP 2020 Paper 5581

Our efforts are made on one-shot voice conversion where the target speaker is unseen in training dataset or both source and target speakers are unseen in the training dataset. In our work, StarGAN is employed to carry out voice conversation between speakers. An embedding vector is used to represent speaker ID. This work relies on two datasets in English and one dataset in Chinese, involving 38 speakers. A user study is conducted to validate our framework in terms of reconstruction quality and conversation quality.

Categories:: Machine Learning

626 Views

Facies-Mark: A Machine Learning Benchmark for Facies Classification

The recent interest in using deep learning for seismic interpretation tasks, such as facies classification, has been facing a significant obstacle, namely the absence of large publicly available annotated datasets for training and testing models. As a result, researchers have often resorted to annotating their own training and testing data. However, different researchers may annotate different classes, or use different train and test splits.

Categories:: Computer Vision
Machine Learning
Geoscience and Remote Sensing

1553 Views

Beta Lactamase Sequences

A well-known publicly available database namely UniProt was the main source for collection beta-lactamase and non-beta-lactamase protein sequences. To obtain relevant positive sequences ‘beta-lactamase’ was used as a keyword. The dataset was meticulously collected by excluding ambiguous sequences, only those sequences were selected which were not annotated with dubious words like potential, by similarity or probable. Moreover, the sequence should be a complete sequence and hence should not be annotated with words like fragment. beta-lactamase protein sequences as well.

Categories:: Machine Learning

44 Views

LANDMASS

This dataset was developed at the School of Electrical and Computer Engineering (ECE) at the Georgia Institute of Technology as part of the ongoing activities at the Center for Energy and Geo-Processing (CeGP) at Georgia Tech and KFUPM. LANDMASS stands for “LArge North-Sea Dataset of Migrated Aggregated Seismic Structures”. This dataset was extracted from the North Sea F3 block under the Creative Commons license (CC BY-SA 3.0).

Categories:: Artificial Intelligence
Computer Vision
Image Processing
Machine Learning
Geoscience and Remote Sensing
Digital signal processing

563 Views

Indoor Stereo Vision and Depth

The is a dataset for indoor depth estimation that contains 1803 synchronized image triples (left, right color image and depth map), from 6 different scenes, including a library, some bookshelves, a conference room, a cafe, a study area, and a hallway. Among these images, 1740 high-quality ones are marked as high-quality imagery. The left view and the depth map are aligned and synchronized and can be used to evaluate monocular depth estimation models. Standard training/testing splits are provided.

Categories:: Computer Vision
Image Processing
Machine Learning

1006 Views

High Resolution Photoluminescence Microscopy Images And Spectra Of Air-Suspended Single-Wall Carbon Nanotubes

The dataset contains high-resolution microscopy images and confocal spectra of semiconducting single-wall carbon nanotubes. Carbon nanotubes allow down-scaling of electronic components to the nano-scale. There is initial evidence from Monte Carlo simulations that microscopy images with high digital resolution show energy information in the Bessel wave pattern that is visible in these images. In this dataset, images from Silicon and InGaAs cameras, as well as spectra, give valuable insights into the spectroscopic properties of these single-photon emitters.

Categories:: Artificial Intelligence
Image Processing
Machine Learning
Image Fusion

675 Views

Controller Design for Electrical Drives by Deep Reinforcement Learning - a Proof of Concept (Supplementary Material)

Supplementary Material for IEEE-TII Transaction Article "Controller Design for Electrical Drives by Deep Reinforcement Learning - a Proof of Concept"

Categories:: Machine Learning

531 Views

ToN_IoT datasets

Collecting and analysing heterogeneous data sources from the Internet of Things (IoT) and Industrial IoT (IIoT) are essential for training and validating the fidelity of cybersecurity applications-based machine learning. However, the analysis of those data sources is still a big challenge for reducing high dimensional space and selecting important features and observations from different data sources.

Categories:: Artificial Intelligence
Machine Learning
Security

12792 Views

The Bot-IoT dataset

The proliferation of IoT systems, has seen them targeted by malicious third parties. To address this challenge, realistic protection and investigation countermeasures, such as network intrusion detection and network forensic systems, need to be effectively developed. For this purpose, a well-structured and representative dataset is paramount for training and validating the credibility of the systems. Although there are several network datasets, in most cases, not much information is given about the Botnet scenarios that were used.

Categories:: IoT
Machine Learning
Security

21102 Views

Machine Learning

Machine Learning

Pages