Machine Learning

We introduce a new robotic RGBD dataset with difficult luminosity conditions: ONERA.ROOM. It comprises RGB-D data (as pairs of images) and corresponding annotations in PASCAL VOC format (xml files)

It aims at People detection, in (mostly) indoor and outdoor environments. People in the field of view can be standing, but also lying on the ground as after a fall.

Categories:
427 Views

This dataset is part of our research on malware detection and classification using Deep Learning. It contains 42,797 malware API call sequences and 1,079 goodware API call sequences. Each API call sequence is composed of the first 100 non-repeated consecutive API calls associated with the parent process, extracted from the 'calls' elements of Cuckoo Sandbox reports.

Categories:
7950 Views

Our efforts are made on one-shot voice conversion where the target speaker is unseen in training dataset or both source and target speakers are unseen in the training dataset. In our work, StarGAN is employed to carry out voice conversation between speakers. An embedding vector is used to represent speaker ID. This work relies on two datasets in English and one dataset in Chinese, involving 38 speakers. A user study is conducted to validate our framework in terms of reconstruction quality and conversation quality.

Categories:
624 Views

The recent interest in using deep learning for seismic interpretation tasks, such as facies classification, has been facing a significant obstacle, namely the absence of large publicly available annotated datasets for training and testing models. As a result, researchers have often resorted to annotating their own training and testing data. However, different researchers may annotate different classes, or use different train and test splits.

Categories:
1548 Views

A well-known publicly available database namely UniProt was the main source for collection beta-lactamase and non-beta-lactamase protein sequences. To obtain relevant positive sequences ‘beta-lactamase’ was used as a keyword. The dataset was meticulously collected by excluding ambiguous sequences, only those sequences were selected which were not annotated with dubious words like potential, by similarity or probable. Moreover, the sequence should be a complete sequence and hence should not be annotated with words like fragment. beta-lactamase protein sequences as well.

Categories:
44 Views

This dataset was developed at the School of Electrical and Computer Engineering (ECE) at the Georgia Institute of Technology as part of the ongoing activities at the Center for Energy and Geo-Processing (CeGP) at Georgia Tech and KFUPM. LANDMASS stands for “LArge North-Sea Dataset of Migrated Aggregated Seismic Structures”. This dataset was extracted from the North Sea F3 block under the Creative Commons license (CC BY-SA 3.0).

Categories:
561 Views

The is a dataset for indoor depth estimation that contains 1803 synchronized image triples (left, right color image and depth map), from 6 different scenes, including a library, some bookshelves, a conference room, a cafe, a study area, and a hallway. Among these images, 1740 high-quality ones are marked as high-quality imagery. The left view and the depth map are aligned and synchronized and can be used to evaluate monocular depth estimation models. Standard training/testing splits are provided.

Categories:
992 Views

The dataset contains high-resolution microscopy images and confocal spectra of semiconducting single-wall carbon nanotubes. Carbon nanotubes allow down-scaling of electronic components to the nano-scale. There is initial evidence from Monte Carlo simulations that microscopy images with high digital resolution show energy information in the Bessel wave pattern that is visible in these images. In this dataset, images from Silicon and InGaAs cameras, as well as spectra, give valuable insights into the spectroscopic properties of these single-photon emitters.

Categories:
674 Views

Supplementary Material for IEEE-TII Transaction Article "Controller Design for Electrical Drives by Deep Reinforcement Learning - a Proof of Concept"

Categories:
527 Views

Collecting and analysing heterogeneous data sources from the Internet of Things (IoT) and Industrial IoT (IIoT) are essential for training and validating the fidelity of cybersecurity applications-based machine learning.  However, the analysis of those data sources is still a big challenge for reducing high dimensional space and selecting important features and observations from different data sources.

Categories:
12478 Views

Pages