Machine Learning

Presented here is a dataset used for our SCADA cybersecurity research. The dataset was built using our SCADA system testbed described in our paper below [*]. The purpose of our testbed was to emulate real-world industrial systems closely. It allowed us to carry out realistic cyber-attacks.

 

Categories:
2112 Views

The dataset comprises of image file s of size 20 x 20 pixels for various types of metals and non-metal.The data collected has been augmented, scaled and modified to represent a number a training set dataset.It can be used to detect and identify object type based on material type in the image.In this process both training data set and test data set can be generated from these image files. 

Categories:
1894 Views

 

This dataset is addressed to build time-aware music recommender systems when evolution of user preferences is considered. It was built by processing the data collected by Oscar Celma (https://www.upf.edu/web/mtg/lastfm360k) from last.fm. It consists of more than 80,000 songs listened to by 50 users over a 2-year period, creating a collection of more than 420,000 timestamped plays.

 

Normal
0

21

false
false
false

Categories:
224 Views

We introduce a new database of voice recordings with the goal of supporting research on vulnerabilities and protection of voice-controlled systems (VCSs). In contrast to prior efforts, the proposed database contains both genuine voice commands and replayed recordings of such commands, collected in realistic VCSs usage scenarios and using modern voice assistant development kits.

Categories:
1720 Views

Message Queuing Telemetry Transport (MQTT) protocol is one of the most used standards used in Internet of Things (IoT) machine to machine communication. The increase in the number of available IoT devices and used protocols reinforce the need for new and robust Intrusion Detection Systems (IDS). However, building IoT IDS requires the availability of datasets to process, train and evaluate these models. The dataset presented in this paper is the first to simulate an MQTT-based network. The dataset is generated using a simulated MQTT network architecture.

Categories:
21765 Views

Invasive lobular carcinoma (ILC) is the second most prevalent histologic subtype of invasive breast cancer. Here, we comprehensively profiled 817 breast tumors, including 127 ILC, 490 ductal (IDC), and 88 mixed IDC/ILC. Besides E-cadherin loss, the best known ILC genetic hallmark, we identified mutations targeting PTEN, TBX3 and FOXA1 as ILC enriched features. PTEN loss associated with increased AKT phosphorylation, which was highest in ILC among all breast cancer subtypes. Spatially clustered FOXA1 mutations correlated with increased FOXA1 expression and activity.

Categories:
613 Views

This dataset is a large-scale Chinese hotel review data set collected by Tan Songbo.  The corpus size is 10,000 reviews. The corpus is automatically collected and organized from Trip.com.

Categories:
1137 Views

This dataset was created from all Landsat-8 images from South America in the year 2018. More than 31 thousand images were processed (15 TB of data), and approximately on half of them active fire pixels were found. The Landsat-8 sensor has 30 meters of spatial resolution (1 panchromatic band of 15m), 16 bits of radiometric resolution and 16 days of temporal resolution (revisit). The images in our dataset are in TIFF (geotiff) format with 10 bands (excluding the 15m panchromatic band).

Categories:
6132 Views

The dataset includes 2 parts: private and public traffic.

The private traffic is self-captured network traffic of serveral softwares, such as YouTube, Skype, streaming video, totally 16 categories.

The public traffic is an open VPN dataset, including numorous VPN or nonVPN network services, totally 24 categories.

 

Categories:
1126 Views

The dataset contains rash images of 11 different disease states. Images of normal skin are also included in the dataset.

Categories:
17553 Views

Pages