Cyber-Physical Production Systems (CPPS) are the key enabling for industrial businesses and economic growth. The introduction of the Internet of Things (IoT) in industrial processes represents a new Internet revolution, mostly known as 4th Industrial Revolution, towards the Smart Manufacturing concept. Despite the huge interest from the industry side to innovate their production systems, in order to increase revenues at lower costs, the IoT concept is still immature and fuzzy, which increases security related risks in industrial systems.

Instructions: 

The generation of the dataset containing OPC UA traffic was possible due to the setup and execution of a laboratory CPPS testbed. This CPPS uses OPC UA standard for horizontal and vertical communications.Regarding the CPPS testbed setup, it consists on seven nodes in the network.Each network node consist on a Raspberry Pi device, running the Python FreeOpcUa implementation. In this configuration, there are two production units, each one containing three devices, and one node representing a Manufacturing Execution System (MES). Each device implements both OPC UA server and client, where the server publish to a OPC UA variable updates regarding sensor readings and the client subscribes all OPC UA variables from all other devices in the same production unit. On the other side, the MES only implements the OPC UA client, which subscribes all OPC UA variables from all devices in both production units. Also, connected to this network, is an attack node as it is assumed that the attacker already gained access to the CPPS network.After setting up the CPPS testbed, a python implementation that implements Tshark was used to capture OPC UA packets and export this traffic to a csv file format dataset. This traffic includes both normal and anomalous behaviour. Anomalous behaviour is achieved with the malicious node, which injects attacks into the CPPS network, targeting one or more device nodes and the MES. The attacks selected for the malicious activities are:

    • Denial of Service(DoS);
    • Eavesdropping or Man-in-the-middle (MITM) attacks;
    • Impersonation or Spoofing attacks.

 

To perform the attacks mentioned, a python script is used, which implements the Scapy module for packet sniffing, injection and modification. Regarding the dataset generation, another python script, that implements Tshark (in this case Pyshark) was used to capture only OPC UA packets and export this traffic to a csv file format dataset. Actually, the OPC UA packets are converted to bidirectional communication flows, which are characterized by the following 32 features:

    • src_ip: Source IP address;
    • src_port: Source port;
    • dst_ip: Destination IP address;
    • dst_port: Destination port;
    • flags: TCP flag status;
    • pktTotalCount: Total packet count;
    • octetTotalCount: Total packet size;
    • avg_ps: Average packet size;
    • proto: Protocol;
    • service: OPC UA service call type;
    • service_errors: Number of service errors in OPC UA request responses;
    • status_errors: Number of status errors in OPC UA request responses;
    • msg_size: OPC UA message transport size;
    • min_msg_size: minimum OPC UA message size;
    • flowStart: Timestamp of flow start;
    • flowEnd: Timestamp of flow end;
    • flowDuration: Flow duration in seconds;
    • avg_flowDuration: Average flow duration in seconds;
    • flowInterval: Time interval between flows in seconds;
    • count: Number of connections to the same destination host as the current connection in the past two seconds;
    • srv_count: Number of connections to the same port number as the current connection in the past two seconds;
    • same_srv_rate: The percentage of connections that were to the same port number, among the connections aggregated in Count;
    • dst_host_same_src_port_rate: The percentage of connections that were to the same source port, among the connections having the same port number;
    • f_pktTotalCount: Total forward packets count;
    • f_octetTotalCount: Total forward packets size;
    • f_flowStart: Timestamp of first forward packet start;
    • f_rate: Rate at which forward packets are transmitted;
    • b_pktTotalCount: Total backwards packets count;
    • b_octetTotalCount: Total backwards packets size;
    • b_flowStart: Timestamp of first backwards packet start;
    • label: Binary label classification;
    • multi_label: Multi classification labeling.

 

The generated dataset has 33.567 normal instances, 74.013 DoS attack instances, 50 impersonation attack instances, and 7 MITM attack instances. This gives a total of 107.634 instances. Also, all attacks were grouped into one class (anomaly - 1) and the rest of the instances belong to the normal class (0).

For more information, please contact the author: Rui Pinto (rpinto@fe.up.pt).

Categories:
776 Views

We provide the data corresponding to the studies of our paper Path Outlines

- Study 1: 

    - data collected:study1.csv

    - analysis: analyseStudy1.R

    - tasks: tasksStudy1.rtf

- Study 2: 

   - RDF data to run the study: nobel_persee.zip

   - data collected: study2.csv

   - analysis: analyseStudy2.R

   - tasks: tasksStudy2.txt

 

Categories:
78 Views

a research data about campaign participation in Surabaya City 2015

Categories:
46 Views

This dataset was collected for research conducted within the project AN.ON-Next funded by the German Federal Ministry of Education and Research (BMBF) with grant number: 16KIS0371.

Instructions: 

The data are primarily gathered with 7-point Likert scales (exceptions are shown in the documentation). Thus, the analysis requires statistical approaches which are applicable to ordinal data. Examples of how the dataset was used in prior research can be found in the documentation.

Categories:
84 Views

This dataset was collected for research conducted within the project AN.ON-Next funded by the German Federal Ministry of Education and Research (BMBF) with grant number: 16KIS0371.

Instructions: 

The data are primarily gathered with 7-point Likert scales (exceptions are shown in the documentation). Thus, the analysis requires statistical approaches which are applicable to ordinal data. Examples of how the dataset was used in prior research can be found in the documentation.

Categories:
77 Views

The water consumption from different house holds recorded for a period of one year

Instructions: 

Water consumption in different time instances

Categories:
128 Views

Time Scale Modification (TSM) is a well-researched field; however, no effective objective measure of quality exists.  This paper details the creation, subjective evaluation, and analysis of a dataset for use in the development of an objective measure of quality for TSM. Comprised of two parts, the training component contains 88 source files processed using six TSM methods at 10 time scales, while the testing component contains 20 source files processed using three additional methods at four time scales.

Instructions: 

When using this dataset, please use the following citation:

@article{doi:10.1121/10.0001567,
author = {Roberts,Timothy and Paliwal,Kuldip K. },
title = {A time-scale modification dataset with subjective quality labels},
journal = {The Journal of the Acoustical Society of America},
volume = {148},
number = {1},
pages = {201-210},
year = {2020},
doi = {10.1121/10.0001567},
URL = {https://doi.org/10.1121/10.0001567},
eprint = {https://doi.org/10.1121/10.0001567}
}

 

Audio files are named using the following structure: SourceName_TSMmethod_TSMratio_per.wav and split into multiple zip files.For 'TSMmethod', PV is the Phase Vocoder algorithm, PV_IPL is the Identity Phase Locking Phase Vocoder algorithm, WSOLA is the Waveform Similarity Overlap-Add algorithm, FESOLA is the Fuzzy Epoch Synchronous Overlap-Add algorithm, HPTSM is the Harmonic-Percussive Separation Time-Scale Modification algorithm and uTVS is the Mel-Scale Sub-Band Modelling Filterbank algorithm. Elastique is the z-Plane Elastique algorithm, NMF is the Non-Negative Matrix Factorization algorithm and FuzzyPV is the Phase Vocoder algorithm using Fuzzy Classification of Spectral Bins.TSM ratios range from 33% to 192% for training files, 20% to 200% for testing files and 22% to 220% for evaluation files.

  • Train: Contains 5280 processed files for training neural networks
  • Test: Contains 240 processed files for testing neural networks
  • Ref_Train: Contains the 88 reference files for the processed training files
  • Ref_Test: Contains the 20 reference files for the processed testing files
  • Eval: Contains 6000 processed files for evaluating TSM methods.  The 20 reference test files were processed at 20 time-scales using the following methods:
    • Phase Vocoder (PV)
    • Identity Phase-Locking Phase Vocoder (IPL)
    • Scaled Phase-Locking Phase Vocoder (SPL)
    • Phavorit IPL and SPL
    • Phase Vocoder with Fuzzy Classification of Spectral Bins (FuzzyPV)
    • Waveform Similarity Overlap-Add (WSOLA)
    • Epoch Synchronous Overlap-Add (ESOLA)
    • Fuzzy Epoch Synchronous Overlap-Add (FESOLA)
    • Driedger's Identity Phase-Locking Phase Vocoder (DrIPL)
    • Harmonic Percussive Separation Time-Scale Modification (HPTSM)
    • uTVS used in Subjective testing (uTVS_Subj)
    • updated uTVS (uTVS)
    • Non-Negative Matrix Factorization Time-Scale Modification (NMFTSM)
    • Elastique.

 

TSM_MOS_Scores.mat is a version 7 MATLAB save file and contains a struct called data that has the following fields:

  • test_loc: Legacy folder location of the test file.
  • test_name: Name of the test file.
  • ref_loc: Legacy folder location of reference file.
  • ref_name: Name of the reference file.
  • method: The method used for processing the file.
  • TSM: The time-scale ratio (in percent) used for processing the file. 100(%) is unity processing. 50(%) is half speed, 200(%) is double speed.
  • MeanOS: Normalized Mean Opinion Score.
  • MedianOS: Normalized Median Opinion Score.
  • std: Standard Deviation of MeanOS.
  • MeanOS_RAW: Mean Opinion Score before normalization.
  • MedianOS_RAW: Median Opinion Scores before normalization.
  • std_RAW: Standard Deviation of MeanOS before normalization.

 

TSM_MOS_Scores.csv is a csv containing the same fields as columns.

Source Code and method implementations are available at www.github.com/zygurt/TSM

Please Note: Labels for the files will be uploaded after paper publication.

Categories:
252 Views

Three well-known Border Gateway Anomalies (BGP) anomalies Slammer, Nimda, and Code Red I occurred in January 2003, September 2001, and July 2001, respectively. The Reseaux IP Europeens (RIPE) BGP update messages are publicly available from the Network Coordination Centre (NCC)and contain Slammer, Nimda, Code Red I, and regular data: https://www.ripe.net/analyse/.

Instructions: 

Raw data from the "route collector rrc 04" are organized in folders labeled by the year and month of the collection date. Complete datasets for Slammer, Nimda, and Code Red I are available from the RIPE route collector rrc 04 site:       RIPE NCC: https://www.ripe.net       Analyze: https://www.ripe.net/analyse       Internet Measurements: https://www.ripe.net/analyse/internet-measurements       Routing Information Service (RIS): https://www.ripe.net/analyse/internet-measurements/routing-information-s...       RIS Raw Data: https://www.ripe.net/analyse/internet-measurements/routing-information-s...       rrc04.ripe.net: data.ris.ripe.net/rrc04/ The date of last modification and the size of the datasets are also included.

BGP update messages are originally collected in multi-threaded routing toolkit (MRT) format. "Zebra-dump-parser" written in Perl is used to extract to ASCII the BGP updated messages. The 37 BGP features were extracted using a C# tool to generate uploaded datasets (csv files). Labels have been added based on the periods when data were collected.

Categories:
120 Views

Since there is no image-based personality dataset, we used the ChaLearn dataset for creating a new dataset that met the characteristics we required for this work, i.e., selfie images where only one person appears and his face is visible, labeled with the person's apparent personality in the photo.

Instructions: 

Portrait Personality dataset of selfies based on the ChaLearn dataset First Impressions. This dataset consists of 30,935 selfies labeled with apparent personality. Each selfie file was named with the prefix of the original video followed by the frame's number. "bigfive_labels.csv" contains the labels for each trait of the Big Five model, using the prefix (name of the original video). Video frames and models are available at https://github.com/miguelmore/personality.

Categories:
352 Views

Pages