Machine Learning

HisarMod: A new challenging modulated signals dataset

In order to increase the diversity in signal datasets, we create a new dataset called HisarMod, which includes 26 classes and 5 different modulation families passing through 5 different wireless communication channel. During the generation of the dataset, MATLAB 2017a is employed for creating random bit sequences, symbols, and wireless fading channels.

Categories:: Artificial Intelligence
Machine Learning
Communications
Signal Processing

8596 Views

Data Fusion Contest 2016 (DFC2016)

The Data Fusion Contest 2016: Goals and Organization

The 2016 IEEE GRSS Data Fusion Contest, organized by the IEEE GRSS Image Analysis and Data Fusion Technical Committee, aimed at promoting progress on fusion and analysis methodologies for multisource remote sensing data.

New multi-source, multi-temporal data including Very High Resolution (VHR) multi-temporal imagery and video from space were released. First, VHR images (DEIMOS-2 standard products) acquired at two different dates, before and after orthorectification:

Categories:: Computer Vision
Machine Learning
Geoscience and Remote Sensing

2088 Views

Twitter Sentiment Analysis Data

This dataset page is currently being updated. The tweets collected by the model deployed at https://live.rlamsal.com.np/ are shared here. However, because of COVID-19, all computing resources I have are being used for a dedicated collection of the tweets related to the pandemic. You can go through the following datasets to access those tweets:

Categories:: Artificial Intelligence
Machine Learning

9626 Views

ONERA.ROOM

We introduce a new robotic RGBD dataset with difficult luminosity conditions: ONERA.ROOM. It comprises RGB-D data (as pairs of images) and corresponding annotations in PASCAL VOC format (xml files)

It aims at People detection, in (mostly) indoor and outdoor environments. People in the field of view can be standing, but also lying on the ground as after a fall.

Categories:: Computer Vision
Machine Learning

448 Views

Malware Analysis Datasets: API Call Sequences

This dataset is part of our research on malware detection and classification using Deep Learning. It contains 42,797 malware API call sequences and 1,079 goodware API call sequences. Each API call sequence is composed of the first 100 non-repeated consecutive API calls associated with the parent process, extracted from the 'calls' elements of Cuckoo Sandbox reports.

Categories:: Machine Learning
Security

8192 Views

ICASSP 2020 Paper 5581

Our efforts are made on one-shot voice conversion where the target speaker is unseen in training dataset or both source and target speakers are unseen in the training dataset. In our work, StarGAN is employed to carry out voice conversation between speakers. An embedding vector is used to represent speaker ID. This work relies on two datasets in English and one dataset in Chinese, involving 38 speakers. A user study is conducted to validate our framework in terms of reconstruction quality and conversation quality.

Categories:: Machine Learning

627 Views

Facies-Mark: A Machine Learning Benchmark for Facies Classification

The recent interest in using deep learning for seismic interpretation tasks, such as facies classification, has been facing a significant obstacle, namely the absence of large publicly available annotated datasets for training and testing models. As a result, researchers have often resorted to annotating their own training and testing data. However, different researchers may annotate different classes, or use different train and test splits.

Categories:: Computer Vision
Machine Learning
Geoscience and Remote Sensing

1564 Views

Beta Lactamase Sequences

A well-known publicly available database namely UniProt was the main source for collection beta-lactamase and non-beta-lactamase protein sequences. To obtain relevant positive sequences ‘beta-lactamase’ was used as a keyword. The dataset was meticulously collected by excluding ambiguous sequences, only those sequences were selected which were not annotated with dubious words like potential, by similarity or probable. Moreover, the sequence should be a complete sequence and hence should not be annotated with words like fragment. beta-lactamase protein sequences as well.

Categories:: Machine Learning

44 Views

LANDMASS

This dataset was developed at the School of Electrical and Computer Engineering (ECE) at the Georgia Institute of Technology as part of the ongoing activities at the Center for Energy and Geo-Processing (CeGP) at Georgia Tech and KFUPM. LANDMASS stands for “LArge North-Sea Dataset of Migrated Aggregated Seismic Structures”. This dataset was extracted from the North Sea F3 block under the Creative Commons license (CC BY-SA 3.0).

Categories:: Artificial Intelligence
Computer Vision
Image Processing
Machine Learning
Geoscience and Remote Sensing
Digital signal processing

568 Views

Indoor Stereo Vision and Depth

The is a dataset for indoor depth estimation that contains 1803 synchronized image triples (left, right color image and depth map), from 6 different scenes, including a library, some bookshelves, a conference room, a cafe, a study area, and a hallway. Among these images, 1740 high-quality ones are marked as high-quality imagery. The left view and the depth map are aligned and synchronized and can be used to evaluate monocular depth estimation models. Standard training/testing splits are provided.

Categories:: Computer Vision
Image Processing
Machine Learning

1012 Views