The availability of labelled Cyber Bulling Types dataset has been exhibited for high profile Natural Language Processing (NLP), which constantly leads the advancement of constructing and model creation-based text. I aim at extracting diverse and efficient Cyber Bully Tweets from the Twitter Social Media Platform. This dataset contains 5 types of cyber bullying samples. They are

1.    Sexual Harassment

2.    Doxing

3.    Cyberstalking


66% of Prestashop websites are at high risk from cyber criminals.

Common Hacks in Prestashop


This dataset supports researchers in the validation process of solutions such as Intrusion Detection Systems (IDS) based on artificial intelligence and machine learning techniques for the detection and categorization of threats in Cyber Physical Systems (CPS). To that aim, data have been acquired from a water distribution hardware-in-the-loop testbed which emulates water passage between nine tanks via solenoid-valves, pumps, pressure and flow sensors. The testbed is composed by a real partition which is virtually connected to a simulated one.


This dataset has related to the paper "A hardware-in-the-loop Water Distribution Testbed (WDT) dataset for cyber-physical security testing".
We provide four different acquisitions:
1) A normal acquisition without attacks ("normal.csv" for network traffic and "dataset_norm.csv" for physical measures)
2) Three acquisitions where different types of attacks and physical faults are reproduced ("attack_1.csv", "attack_2.csv" and "attack_3.csv" for network traffic and "dataset_att_1.csv", "dataset_att_2.csv" and "dataset_att_3.csv" for physical measures)
In addition to .csv files we provide four .pcap files ("attack_1.pcap", "attack_2.pcap", "attack_3.pcap" and "normal.pcap") which refer to network acquisitions for the four previous scenarios.
A README.xlsx file summarizes the key features of the entire dataset.


This dataset is captured from a Mirai type botnet attack on an emulated IoT network in OpenStack. Detailed information on the dataset is depicted in the following work. Please cite it when you use this dataset for your research.

  • Kalupahana Liyanage Kushan Sudheera, Dinil Mon Divakaran, Rhishi Pratap Singh, and Mohan Gurusamy, "ADEPT: Detection and Identification of Correlated Attack-Stages in IoT Networks," in IEEE Internet of Things Journal.


Presented here is a dataset used for our SCADA cybersecurity research. The dataset was built using our SCADA system testbed described in our paper below [*]. The purpose of our testbed was to emulate real-world industrial systems closely. It allowed us to carry out realistic cyber-attacks.



Provided dataset is cleased, pre-processed, and ready to use. The users may modify as they wish, but please cite the dataset as below.

M. A. Teixeira, M. Zolanvari, R. Jain, "WUSTL-IIOT-2018 Dataset for ICS (SCADA) Cybersecurity Research," 2018. [Online]. Available:


Cyber-Physical Production Systems (CPPS) are the key enabling for industrial businesses and economic growth. The introduction of the Internet of Things (IoT) in industrial processes represents a new Internet revolution, mostly known as 4th Industrial Revolution, towards the Smart Manufacturing concept. Despite the huge interest from the industry side to innovate their production systems, in order to increase revenues at lower costs, the IoT concept is still immature and fuzzy, which increases security related risks in industrial systems.


The generation of the dataset containing OPC UA traffic was possible due to the setup and execution of a laboratory CPPS testbed. This CPPS uses OPC UA standard for horizontal and vertical communications.Regarding the CPPS testbed setup, it consists on seven nodes in the network.Each network node consist on a Raspberry Pi device, running the Python FreeOpcUa implementation. In this configuration, there are two production units, each one containing three devices, and one node representing a Manufacturing Execution System (MES). Each device implements both OPC UA server and client, where the server publish to a OPC UA variable updates regarding sensor readings and the client subscribes all OPC UA variables from all other devices in the same production unit. On the other side, the MES only implements the OPC UA client, which subscribes all OPC UA variables from all devices in both production units. Also, connected to this network, is an attack node as it is assumed that the attacker already gained access to the CPPS network.After setting up the CPPS testbed, a python implementation that implements Tshark was used to capture OPC UA packets and export this traffic to a csv file format dataset. This traffic includes both normal and anomalous behaviour. Anomalous behaviour is achieved with the malicious node, which injects attacks into the CPPS network, targeting one or more device nodes and the MES. The attacks selected for the malicious activities are:

    • Denial of Service(DoS);
    • Eavesdropping or Man-in-the-middle (MITM) attacks;
    • Impersonation or Spoofing attacks.


To perform the attacks mentioned, a python script is used, which implements the Scapy module for packet sniffing, injection and modification. Regarding the dataset generation, another python script, that implements Tshark (in this case Pyshark) was used to capture only OPC UA packets and export this traffic to a csv file format dataset. Actually, the OPC UA packets are converted to bidirectional communication flows, which are characterized by the following 32 features:

    • src_ip: Source IP address;
    • src_port: Source port;
    • dst_ip: Destination IP address;
    • dst_port: Destination port;
    • flags: TCP flag status;
    • pktTotalCount: Total packet count;
    • octetTotalCount: Total packet size;
    • avg_ps: Average packet size;
    • proto: Protocol;
    • service: OPC UA service call type;
    • service_errors: Number of service errors in OPC UA request responses;
    • status_errors: Number of status errors in OPC UA request responses;
    • msg_size: OPC UA message transport size;
    • min_msg_size: minimum OPC UA message size;
    • flowStart: Timestamp of flow start;
    • flowEnd: Timestamp of flow end;
    • flowDuration: Flow duration in seconds;
    • avg_flowDuration: Average flow duration in seconds;
    • flowInterval: Time interval between flows in seconds;
    • count: Number of connections to the same destination host as the current connection in the past two seconds;
    • srv_count: Number of connections to the same port number as the current connection in the past two seconds;
    • same_srv_rate: The percentage of connections that were to the same port number, among the connections aggregated in Count;
    • dst_host_same_src_port_rate: The percentage of connections that were to the same source port, among the connections having the same port number;
    • f_pktTotalCount: Total forward packets count;
    • f_octetTotalCount: Total forward packets size;
    • f_flowStart: Timestamp of first forward packet start;
    • f_rate: Rate at which forward packets are transmitted;
    • b_pktTotalCount: Total backwards packets count;
    • b_octetTotalCount: Total backwards packets size;
    • b_flowStart: Timestamp of first backwards packet start;
    • label: Binary label classification;
    • multi_label: Multi classification labeling.


The generated dataset has 33.567 normal instances, 74.013 DoS attack instances, 50 impersonation attack instances, and 7 MITM attack instances. This gives a total of 107.634 instances. Also, all attacks were grouped into one class (anomaly - 1) and the rest of the instances belong to the normal class (0).

For more information, please contact the author: Rui Pinto (


One of the major research challenges in this field is the unavailability of a comprehensive network based data set which can reflect modern network traffic scenarios, vast varieties of low footprint intrusions and depth structured information about the network traffic. Evaluating network intrusion detection systems research efforts, KDD98, KDDCUP99 and NSLKDD benchmark data sets were generated a decade ago. However, numerous current studies showed that for the current network threat environment, these data sets do not inclusively reflect network traffic and modern low footprint attacks.